Image encoding device and method and image decoding device and method

ABSTRACT

The present disclosure relates to an image encoding device and method and an image decoding device and method, which are capable of suppressing an increase in encoding or decoding workload. A current layer of image data including a plurality of layers is encoded and/or decoded with reference to encoding-related information of some areas, of another layer encoded for each of a plurality of certain areas obtained by dividing a picture, according to control of control information used to control the certain area in which the encoding-related information of the other layer is referred to regarding the current layer of the image data. The present disclosure can be applied to image processing devices such as an image encoding device for performing scalable coding on image data and an image decoding device for decoding an encoded data obtained by performing scalable coding on image data

CROSS REFERENCE TO PRIOR APPLICATION

This application is a continuation of U.S. patent application Ser. No.17/237,661 (filed on Apr. 22, 2021), which is a continuation of U.S.patent application Ser. No. 14/773,834 (filed on Sep. 9, 2015), which isa National Stage Patent Application of PCT International PatentApplication No. PCT/JP2014/056311 (filed on Mar. 11, 2014) under 35U.S.C. § 371, which claims priority to Japanese Patent Application No.2013-058679 (filed on Mar. 21, 2013), which are all hereby incorporatedby reference in their entirety.

TECHNICAL FIELD

The present disclosure relates to an image encoding device and methodand an image decoding device and method, and more particularly, to animage encoding device and method and an image decoding device andmethod, which are capable of suppressing an increase in encoding ordecoding workload.

BACKGROUND ART

Recently, devices for compressing and encoding an image by adopting aencoding scheme of handling image information digitally and performingcompression by an orthogonal transform such as a discrete cosinetransform and motion compensation using image information-specificredundancy for the purpose of information transmission and accumulationwith high efficiency when the image information is handled digitallyhave become widespread. Moving Picture Experts Group (MPEG) and the likeare examples of such encoding schemes.

Particularly, MPEG 2 (ISO/IEC 13818-2) is a standard that is defined asa general-purpose image encoding scheme, and covers interlaced scanimages, progressive scan images, standard resolution images, and highdefinition images. For example, MPEG 2 is now being widely used in awide range of applications such as professional use and consumer use.Using the MPEG 2 compression scheme, for example, in the case of aninterlaced scan image of a standard resolution having 720×480 pixels, acoding amount (bit rate) of 4 to 8 Mbps is allocated. Further, using theMPEG 2 compression scheme, for example, in the case of an interlacedscan image of a high resolution having 1920×1088 pixels, a coding amount(bit rate) of 18 to 22 Mbps is allocated. Thus, it is possible toimplement a high compression rate and a preferable image quality.

MPEG 2 is mainly intended for high definition coding suitable forbroadcasting but does not support an encoding scheme having a codingamount (bit rate) lower than that of MPEG 1, that is, an encoding schemeof a high compression rate. With the spread of mobile terminals, it isconsidered that the need for such an encoding scheme will increase inthe future, and thus an MPEG 4 encoding scheme has been standardized. Aninternational standard for an image encoding scheme was approved asISO/IEC 14496-2 in December, 1998.

Further, in recent years, standards such as H.26L (InternationalTelecommunication Union Telecommunication Standardization Sector Q6/16Video Coding Expert Group (ITU-T Q6/16 VCEG)) for the purpose of imageencoding for video conferences have been standardized. H.26L requires alarger computation amount for encoding and decoding than in existingencoding schemes such as MPEG 2 or MPEG 4, but is known to implementhigh encoding efficiency. Further, currently, as one activity of MPEG 4,standardization of incorporating even a function that is not supportedin H.26L and implementing high encoding efficiency based on H.26L hasbeen performed as a Joint Model of Enhanced-Compression Video Coding.

As a standardization schedule, an international standard called H.264and MPEG-4 Part10 (Advanced Video Coding (hereinafter referred to as“AVC”) was established in March, 2003.

Furthermore, as an extension of H.264/AVC, Fidelity Range Extension(FRExt) including an encoding tool necessary for professional use suchas RGB or 4:2:2 or 4:4:4 or 8×8 DCT and a quantization matrix which arespecified in MPEG-2 was standardized in February, 2005. As a result.H.264/AVC has become an encoding scheme capable of also expressing filmnoise included in movies well and is being used in a wide range ofapplications such as Blu-Ray Discs (trademark).

However, in recent years, there is an increasing need for highcompression rate encoding capable of compressing an image of about4000×2000 pixels, which is 4 times that of a high-definition image, ordelivering a high-definition image in a limited transmission capacityenvironment such as the Internet. To this end, improvements in encodingefficiency have been under continuous review by Video Coding ExpertsGroup (VCEG) under ITU-T.

In this regard, currently, in order to further improve the encodingefficiency to be higher than in AVC, Joint Collaboration Team-VideoCoding (JCTVC), which is a joint standardization organization of ITU-Tand ISO/IEC, has been standardizing an encoding scheme called HighEfficiency Video Coding (HEVC). A committee draft that is a draftspecification for the HEVC standard was issued in January, 2013 (seeNon-Patent Literature 1).

In HEVC, it is possible to perform parallel processing based on a tileor wavefront parallel processing in addition to a slice that is alsodefined in AVC.

Moreover, the existing image encoding schemes such as MPEG-2 and AVChave a scalability function of dividing an image into a plurality oflayers and encoding the plurality of layers.

In other words, for example, for a terminal having a low processingcapability such as a mobile phone, image compression information of onlya base layer is transmitted, and a moving image of low spatial andtemporal resolutions or a low quality is reproduced, and for a terminalhaving a high processing capability such as a television or a personalcomputer, image compression information of an enhancement layer as wellas a base layer is transmitted, and a moving image of high spatial andtemporal resolutions or a high quality is reproduced. That is, imagecompression information according to a capability of a terminal or anetwork can be transmitted from a server without performing thetranscoding process.

CITATION LIST Non-Patent Literature

Non-Patent Literature 1: Benjamin Bross, Woo-Jin Han, Gary J. Sullivan,Jens-Rainer Ohm, Gary J. Sullivan, Ye-Kui Wang, Thomas Wiegand, “HighEfficiency Video Coding (HEVC) text specification draft 10 (for FDIS &Consent),” JCTVC-L1003_v4, Joint Collaborative Team on Video Coding(JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 12th Meeting:Geneva. CH, 14-23 Jan. 2013

SUMMARY OF INVENTION Technical Problem

However, in the method of the related art, when encoding-relatedinformation of the base layer such as decoded image information or themotion information is referred to in encoding and decoding of theenhancement layer, the entire picture of the base layer was a target forreference.

For this reason, workload was likely to increase, for example, inencoding and decoding of the enhancement layer, the number of memoryaccesses for referring to the encoding-related information of the baselayer increases.

The present disclosure has been made in light of the foregoing, and itis desirable to suppress an increase in encoding or decoding workload.

Solution to Problem

According to an embodiment of the present technology, there is providedan image encoding device including: a generation section configured togenerate control information used to control a certain area in whichencoding-related information, of another layer encoded for each of aplurality of certain areas obtained by dividing a picture, is referredto regarding a current layer of image data including a plurality oflayers; an encoding section configured to encode the current layer ofthe image data with reference to the encoding-related information ofsome areas of the other layer according to control of the controlinformation generated by the generation section; and a transmissionsection configured to transmit encoded data of the image data generatedby the encoding section and the control information generated by thegeneration section.

The control information may be information limiting an area in which theencoding-related information is referred to by designating an area inwhich reference to the encoding-related information of the other layeris permitted, designating an area in which reference to theencoding-related information is prohibited, or designating an area inwhich the encoding-related information is referred to.

The control information may designate the area using an identificationnumber allocated in a raster scan order, information indicatingpositions of the area in vertical and horizontal directions in apicture, or information indicating a data position of the area in theencoded data.

The transmission section may further transmit information indicatingwhether or not to control an area in which the encoding-relatedinformation is referred to.

The encoding-related information may be information used for generationof a prediction image used in encoding of the image data.

The information used for the generation of the prediction image mayinclude information used for texture prediction of the image data andinformation used for syntax prediction of the image data. The controlinformation may be information used to independently control an area inwhich the information used for the texture prediction is referred to andan area in which the information used for the syntax prediction isreferred to.

The generation section may generate the control information for each ofthe plurality of certain areas obtained by dividing the picture of thecurrent layer of the image data. The encoding section may encode thecurrent layer of the image data with reference to the encoding-relatedinformation of some areas of the other layer for each of the areasaccording to control of the control information of each area generatedby the generation section.

The transmission section may further transmit information indicatingwhether or not an area division of the current layer is similar to anarea division of the other layer.

The area may be a slice or a tile of the image data.

According to an embodiment of the present technology, there is providedan image encoding method including: generating control information usedto control a certain area in which encoding-related information, ofanother layer encoded for each of a plurality of certain areas obtainedby dividing a picture, is referred to regarding a current layer of imagedata including a plurality of layers; encoding the current layer of theimage data with reference to the encoding-related information of someareas of the other layer according to control of the generated controlinformation; and transmitting encoded data generated by encoding theimage data and the generated control information.

According to another embodiment of the present technology, there isprovided an image decoding device including: a reception sectionconfigured to receive encoded data of a current laver of image dataincluding a plurality of layers and control information used to controla certain area in which encoding-related information, of another layerencoded for each of a plurality of certain areas obtained by dividing apicture of the image data, is referred to; and a decoding sectionconfigured to decode the encoded data with reference to theencoding-related information of some areas of the other layer accordingto control of the control information received by the reception section.

The control information may be information limiting an area in which theencoding-related information is referred to by designating an area inwhich reference to the encoding-related information of the other layeris permitted, designating an area in which reference to theencoding-related information is prohibited, or designating an area inwhich the encoding-related information is referred to.

The control information may designate the area using an identificationnumber allocated in a raster scan order, information indicatingpositions of the area in vertical and horizontal directions in apicture, or information indicating a data position of the area in theencoded data.

The reception section may further receive information indicating whetheror not to control an area in which the encoding-related information isreferred to.

The encoding-related information may be information used for generationof a prediction image used in decoding of the encoded data.

The information used for the generation of the prediction image mayinclude information used for texture prediction of the image data andinformation used for syntax prediction of the image data. The controlinformation may be information used to independently control an area inwhich the information used for the texture prediction is referred to andan area in which the information used for the syntax prediction isreferred to.

The reception section may receive the encoded data encoded for each ofthe plurality of certain areas obtained by dividing the picture of thecurrent layer of the image data and the control information of each ofthe areas. The decoding section may decode the encoded data received bythe reception section with reference to the encoding-related informationof some areas of the other layer for each of the areas according tocontrol of the control information of each area.

The reception section may further receive information indicating whetheror not an area division of the current layer is similar to an areadivision of the other layer.

The area may be a slice or a tile of the image data.

According to another embodiment of the present technology, there isprovided an image decoding method including: receiving encoded data of acurrent layer of image data including a plurality of layers and controlinformation used to control a certain area in which encoding-relatedinformation, of another layer encoded for each of a plurality of certainareas obtained by dividing a picture of the image data, is referred to;and decoding the encoded data with reference to the encoding-relatedinformation of some areas of the other layer according to control of thereceived control information.

According to one aspect of the present technology, control informationused to control an area in which encoding-related information, ofanother layer encoded for each of a plurality of certain areas obtainedby dividing a picture, is referred to regarding a current layer of imagedata including a plurality of layers is generated, the current layer ofthe image data is encoded with reference to the encoding-relatedinformation of some areas of the other layer according to control of thegenerated control information, and encoded data generated by encodingthe image data and the generated control information is transmitted.

According to another aspect of the present technology, encoded data of acurrent layer of image data including a plurality of layers and controlinformation used to control an area in which encoding-relatedinformation, of another layer encoded for each of a plurality of certainareas obtained by dividing a picture of the image data, is referred toare received, and the encoded data is decoded with reference to theencoding-related information of some areas of the other layer accordingto control of the received control information.

Advantageous Benefits of Invention

According to the present disclosure, it is possible to encode and decodean image. Particularly, it is possible to suppress an increase inencoding or decoding workload.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram for describing an example of a configuration of acoding unit.

FIG. 2 is a diagram illustrating an example of a scalable layered imageencoding scheme.

FIG. 3 is a diagram for describing an example of spatial scalablecoding.

FIG. 4 is a diagram for describing an example of temporal scalablecoding.

FIG. 5 is a diagram for describing an example of scalable coding of asignal to noise ratio.

FIG. 6 is a diagram for describing an example of a slice.

FIG. 7 is a diagram for describing an example of a tile.

FIG. 8 is a diagram for describing an example of base layer referencecontrol.

FIG. 9 is a diagram for describing an example of a tile setting.

FIG. 10 is a diagram for describing another example of base layerreference control.

FIG. 11 is a diagram for describing an example of a parallel process.

FIG. 12 is a diagram for describing an example of a method of allocatingan identification number of a tile.

FIG. 13 is a diagram for describing an example of syntax of a pictureparameter set.

FIG. 14 is a continuation from FIG. 13 for describing an example ofsyntax of a picture parameter set.

FIG. 15 is a diagram for describing an example of syntax of a sliceheader.

FIG. 16 is a continuation from FIG. 15 for describing an example ofsyntax of a slice header.

FIG. 17 is a continuation from FIG. 16 for describing an example ofsyntax of a slice header.

FIG. 18 is a block diagram illustrating an example of a mainconfiguration of an image encoding device.

FIG. 19 is a block diagram illustrating an example of a mainconfiguration of a base layer image encoding section.

FIG. 20 is a block diagram illustrating an example of a mainconfiguration of an enhancement layer image encoding section.

FIG. 21 is a block diagram illustrating an example of a mainconfiguration of an area synchronization section.

FIG. 22 is a flowchart for describing an example of the flow of an imageencoding process.

FIG. 23 is a flowchart for describing an example of the flow of a baselayer encoding process.

FIG. 24 is a flowchart for describing an example of the flow of anenhancement layer encoding process.

FIG. 25 is a flowchart for describing an example of the flow of anenhancement layer encoding process, continuing from FIG. 24 .

FIG. 26 is a block diagram illustrating an example of a mainconfiguration of an image decoding device.

FIG. 27 is a block diagram illustrating an example of a mainconfiguration of a base layer image decoding section.

FIG. 28 is a block diagram illustrating an example of a mainconfiguration of an enhancement layer image decoding section.

FIG. 29 is a block diagram illustrating an example of a mainconfiguration of an area synchronization section.

FIG. 30 is a flowchart for describing an example of the flow of an imagedecoding process.

FIG. 31 is a flowchart for describing an example of the flow of a baselayer decoding process.

FIG. 32 is a flowchart for describing an example of the flow of anenhancement layer decoding process.

FIG. 33 is a flowchart for describing an example of the flow of anenhancement layer decoding process, continuing from FIG. 32 .

FIG. 34 is a diagram illustrating an example of a multi-view imageencoding scheme.

FIG. 35 is a diagram illustrating an example of a main configuration ofa multi-view image encoding device to which the present disclosure isapplied.

FIG. 36 is a diagram illustrating an example of a main configuration ofa multi-view image decoding device to which the present disclosure isapplied.

FIG. 37 is a block diagram illustrating an example of a mainconfiguration of a computer.

FIG. 38 is a block diagram illustrating an example of a schematicconfiguration of a television device.

FIG. 39 is a block diagram illustrating an example of a schematicconfiguration of a mobile phone.

FIG. 40 is a block diagram illustrating an example of a schematicconfiguration of a recording/reproduction device.

FIG. 41 is a block diagram illustrating an example of a schematicconfiguration of an image capturing device.

FIG. 42 is a block diagram illustrating an example of using scalablecoding.

FIG. 43 is a block diagram illustrating another example of usingscalable coding.

FIG. 44 is a block diagram illustrating another example of usingscalable coding.

FIG. 45 is a block diagram illustrating an example of a schematicconfiguration of a video set.

FIG. 46 is a block diagram illustrating an example of a schematicconfiguration of a video processor.

FIG. 47 is a block diagram illustrating another example of a schematicconfiguration of a video processor.

FIG. 48 is an explanatory diagram illustrating a configuration of acontent reproducing system.

FIG. 49 is an explanatory diagram illustrating the flow of data in acontent reproducing system.

FIG. 50 is an explanatory diagram illustrating a specific example of anMPD.

FIG. 51 is a functional block diagram illustrating a configuration of acontent server of a content reproducing system.

FIG. 52 is a functional block diagram illustrating a configuration of acontent reproducing device of a content reproducing system.

FIG. 53 is a functional block diagram illustrating a configuration of acontent server of a content reproducing system.

FIG. 54 is a sequence chart illustrating a communication processingexample by respective devices of a wireless communication system.

FIG. 55 is a sequence chart illustrating a communication processingexample by respective devices of a wireless communication system.

FIG. 56 is a diagram schematically illustrating an example of aconfiguration of a frame format transmitted and received in acommunication process by respective devices of a wireless communicationsystem.

FIG. 57 is a sequence chart illustrating a communication processingexample by respective devices of a wireless communication system.

DESCRIPTION OF EMBODIMENTS

Hereinafter, modes (hereinafter, referred to as “embodiments”) forcarrying out the present disclosure will be described. A descriptionwill proceed in the following order.

1. Main description of present technology

2. First embodiment (image encoding device)

3. Second embodiment (image decoding device)

4. Third embodiment (multi-view image encoding device and multi-viewimage decoding device)

5. Fourth embodiment (computer)

6. Application examples

7. Application examples of scalable coding

8. Fifth embodiment (set, unit, module, and processor)

9. Application example of content reproducing system of MPEG-DASH

10. Application example of wireless communication system of Wi-Fistandard

1. Main Description for Present Technology

<Overview>

[Encoding Scheme]

Hereinafter, the present technology will be described in connection withan application to image encoding and decoding of a High Efficiency VideoCoding (HEVC) scheme.

<Coding Unit>

In an Advanced Video Coding (AVC) scheme, a hierarchical structure basedon a macroblock and a sub macroblock is defined. However, a macroblockof 16×16 pixels is not optimal for a large image frame such as a UltraHigh Definition (UHD) (4000×2000 pixels) serving as a target of a nextgeneration encoding scheme.

On the other hand, in the HEVC scheme, a coding unit (CU) is defined asillustrated in FIG. 1 .

A CU is also referred to as a coding tree block (CTB), and serves as apartial area of an image of a picture unit undertaking a similar role ofa macroblock in the AVC scheme. The latter is fixed to a size of 16×16pixels, but the former is not fixed to a certain size but designated inimage compression information in each sequence.

For example, a largest coding unit (LCU) and a smallest coding unit(SCU) of a CU are specified in a sequence parameter set (SPS) includedin encoded data to be output.

As split-flag=1 is set in a range in which each LCU is not smaller thanan SCU, a coding unit can be divided into CUs having a smaller size. Inthe example of FIG. 1 , a size of an LCU is 128, and a largest scalabledepth is 5. A CU of a size of 2N×2N is divided into CUs having a size ofN×N serving as a layer that is one-level lower when a value ofsplit_flag is 1.

Further, a CU is divided in prediction units (PUs) that are areas(partial areas of an image of a picture unit) serving as processingunits of intra or inter prediction, and divided into transform units(TUs) that are areas (partial areas of an image of a picture unit)serving as processing units of orthogonal transform. Currently, in theHEVC scheme, in addition to 4×4 and 8×8, orthogonal transform of 16×16and 32×32 can be used.

As in the HEVC scheme, in the case of an encoding scheme in which a CUis defined and various kinds of processes are performed in units of CUs,in the AVC scheme, a macroblock can be considered to correspond to anLCU, and a block (sub block) can be considered to correspond to a CU.Further, in the AVC scheme, a motion compensation block can beconsidered to correspond to a PU. However, since a CU has a hierarchicalstructure, a size of an LCU of a topmost layer is commonly set to belarger than a macroblock in the AVC scheme, for example, such as 128×128pixels.

Thus, hereinafter, an LCU is assumed to include a macroblock in the AVCscheme, and a CU is assumed to include a block (sub block) in the AVCscheme. In other words, a “block” used in the following descriptionindicates an arbitrary partial area in a picture, and, for example, asize, a shape, and characteristics thereof are not limited. In otherwords, a “block” includes an arbitrary area (a processing unit) such asa TU, a PU, an SCU, a CU, an LCU, a sub block, a macroblock, or a slice.Of course, a “block” includes other partial areas (processing units) aswell. When it is necessary to limit a size, a processing unit, or thelike, it will be appropriately described.

<Mode Selection>

Moreover, in the AVC and HEVC encoding schemes, in order to achieve highencoding efficiency, it is important to select an appropriate predictionmode.

As an example of such a selection method, there is a method implementedin reference software (found athttp://iphome.hhi.de/suehring/tml/index.htm) of H.264/MPEG-4 AVC calleda joint model (JM).

In the JM, as will be described later, it is possible to select two modedetermination methods, that is, a high complexity mode and a lowcomplexity mode. In both modes, cost function values related torespective prediction modes are calculated, and a prediction mode havinga smaller cost function value is selected as an optimal mode for acorresponding block or macroblock.

A cost function in the high complexity mode is represented as in thefollowing Formula (1):

[Math. 1]

Cost(Mode∈Ω)=D+λ*R  (1)

Here, Ω indicates a universal set of candidate modes for encoding acorresponding block or macroblock, and D indicates differential energybetween a decoded image and an input image when encoding is performed ina corresponding prediction mode. λ indicates Lagrange's undeterminedmultiplier given as a function of a quantization parameter. R indicatesa total coding amount including an orthogonal transform coefficient whenencoding is performed in a corresponding mode.

In other words, in order to perform encoding in the high complexitymode, it is necessary to perform a temporary encoding process once byall candidate modes in order to calculate the parameters D and R, andthus a large computation amount is required.

A cost function in the low complexity mode is represented by thefollowing Formula (2):

[Math. 2]

Cost(Mode∈Ω)=D+QP2Quant(QP)*HeaderBit  (2)

Here, D is different from that of the high complexity mode and indicatesdifferential energy between a prediction image and an input image.QP2Quant (QP) is given as a function of a quantization parameter QP, andHeaderBit indicates a coding amount related to information belonging toa header such as a motion vector or a mode including no orthogonaltransform coefficient.

In other words, in the low complexity mode, it is necessary to perform aprediction process for respective candidate modes, but since a decodedimage is not necessary, it is unnecessary to perform an encodingprocess. Thus, it is possible to implement a computation amount smallerthan that in the high complexity mode.

<Scalable Coding>

Moreover, the existing image encoding schemes such as MPEG2 and AVC havea scalability function. Scalable coding refers to a scheme of dividing(hierarchizing) an image into a plurality of layers and performingencoding for each layer. FIG. 2 is a diagram illustrating an example ofa layered image encoding scheme.

As illustrated in FIG. 2 , in hierarchization of an image, one image isdivided into a plurality of layers based on a certain parameter with ascalability function. In other words, a hierarchized image (a layeredimage) includes a plurality of layers that differs in a value of acertain parameter. The plurality of layers of the layered image isconfigured with a base layer on which encoding and decoding areperformed using only an image of its own layer without using an image ofanother layer and a non-base layer (which is also referred to as an“enhancement layer”) on which encoding and decoding are performed usingan image of another layer. For the non-base layer, an image of the baselayer may be used, and an image of another non-base layer may be used.

Generally, in order to reduce the redundancy, the non-base layer isconfigured with data (differential data) of a differential image betweenan image of its own and an image of another layer. For example, when oneimage is hierarchized into two layers, that is, the base layer and thenon-base layer (also referred to as an “enhancement layer”), an image ofa lower quality than an original image is obtained using only data ofthe base layer, and an original image (that is, a high-quality image) isobtained by combining data of the base layer with data of theenhancement layer.

As an image is hierarchized as described above, it is possible to obtainimages of various qualities according to the situation. For example, fora terminal having a low processing capability such as a mobile phone,image compression information of only a base layer is transmitted, and amoving image of low spatial and temporal resolutions or a low quality isreproduced, and for a terminal having a high processing capability suchas a television or a personal computer, image compression information ofan enhancement layer as well as a base layer is transmitted, and amoving image of high spatial and temporal resolutions or a high qualityis reproduced. In other words, image compression information accordingto a capability of a terminal or a network can be transmitted from aserver without performing the transcoding process.

<Scalable Parameter>

In such layered image encoding and layered image decoding (scalableencoding and scalable decoding), a parameter with a scalability functionis arbitrary. For example, spatial resolution as illustrated in FIG. 3may be its parameter (spatial scalability). When the spatial scalabilitydiffers, respective layers have different resolutions of an image. Inother words, each picture is hierarchized into two layers, that is, abase layer of a resolution spatially lower than that of an originalimage and an enhancement layer that is combined with an image of thebase layer to obtain an original image (an original spatial resolution)as illustrated in FIG. 3 . Of course, the number of layers is anexample, and each picture can be hierarchized into an arbitrary numberof layers.

As another parameter having such scalability, for example, a temporalresolution (temporal scalability) as illustrated in FIG. 4 may beapplied. In the case of the temporal scalability, respective layers havedifferent frame rates. In other words, in this case, each picture ishierarchized into layers having different frame rates, a moving image ofa high frame rate can be obtained by combining a layer of a high framerate with a layer of a low frame rate, and an original moving image (anoriginal frame rate) can be obtained by combining all the layers asillustrated in FIG. 4 . The number of layers is an example, and eachpicture can be hierarchized into an arbitrary number of layers.

Further, as another parameter having such scalability, for example,there is a signal-to-noise ratio (SNR) (SNR scalability). In the case ofthe SNR scalability, respective layers having different SNRs. In otherwords, in this case, each picture is hierarchized into two layers, thatis, a base layer of an SNR lower than that of an original image and anenhancement layer that is combined with an image of the base layer toobtain an original SNR as illustrated in FIG. 5 . In other words, forbase layer image compression information, information related to animage of a low PSNR is transmitted, and a high PSNR image can bereconstructed by combining the information with the enhancement layerimage compression information. Of course, the number of layers is anexample, and each picture can be hierarchized into an arbitrary numberof layers.

A parameter other than the above-described examples may be applied as aparameter having scalability. For example, there is bit-depthscalability in which the base layer includes an 8-bit image, and a10-bit image can be obtained by adding the enhancement layer to the baselayer.

Further, there is chroma scalability in which the base layer includes acomponent image of a 4:2:0 format, and a component image of a 4:2:2format can be obtained by adding the enhancement layer to the baselayer.

<Area Division>

Moreover, in HEVC, it is possible to perform parallel processing basedon a tile or wavefront parallel processing in addition to a slice thatis also defined in AVC.

FIG. 6 is a diagram illustrating an example of a slice defined in HEVC.Similarly to that of AVC, a slice is a unit in which an encoding processis performed in a raster scan order, and includes a plurality of areasobtained by dividing a picture as illustrated in FIG. 6 . Here, in HEVC,slice division can be performed only in units of LCUs. In FIG. 6 , theentire square indicates a picture, and a small square indicates an LCU.Further, groups of LCUs having different patterns indicate slices. Forexample, a slice including LCUs of first and second lines from the topwhich is indicated by a hatched pattern is a first slice (Slice #1) ofthe picture. A slice including LCUs of third and fourth lines from thetop which is indicated by a white background is a second slice (Slice#2) of the picture. A slice including LCUs of fifth and six lines fromthe top which is indicated by a gray background is a third slice (Slice#3) of the picture. A slice including LCUs of seventh and eighth linesfrom the top which is indicated by a mesh pattern is a fourth slice(Slice #4) of the picture. Of course, the number of slices or LCUsformed in the picture and a slice division method are arbitrary and notlimited to the example of FIG. 6 .

FIG. 7 illustrates an example of a tile defined in HEVC. A tile is anarea obtained by dividing a picture in units of LCUs, similarly to aslice. However, a slice is an area obtained by dividing a picture sothat LCUs are processed in a raster scan order, whereas a tile is anarea obtained by dividing a picture into arbitrary rectangles asillustrated in FIG. 7 .

In FIG. 7 , the entire square indicates a picture, and a small squareindicates an LCU. Further, groups of LCUs having different patternsindicate tiles. For example, a slice including 4×4 LCUs on the upperleft which is indicated by a hatched pattern is a first tile (Tile #1)of the picture. A tile including 4×4 LCUs on the upper right which isindicated by a white background is a second tile (Tile #2) of thepicture. A tile including 4×4 LCUs on the lower left which is indicatedby a gray background is a third tile (Tile #3) of the picture. A tileincluding 4×4 LCUs on the lower right which is indicated by a meshpattern is a fourth tile (Tile #4) of the picture. Of course, the numberof tiles or LCUs formed in the picture and a tile division method arearbitrary and not limited to the example of FIG. 7 .

In each tile formed as described above, the LCUs are processed in theraster scan order. Since the tile has a shorter boundary length than theslice, the tile has a characteristic in which a decrease in encodingefficiency by screen division is small.

The slices or tiles divided as described above can be processedindependently of one another since there is no dependence relation ofprediction. CABAC, or the like in encoding or decoding. In other words,for example, data of slices (or tiles) can be processed in parallelusing different central processing units (CPUs) (or different cores).

<Area Division in Scalable Coding>

Moreover, in the scalable coding, encoding-related information of thebase layer can be used in encoding of the enhancement layer. Content ofthe encoding-related information is arbitrary, but includes, forexample, texture information such as a decoded image, syntax informationsuch as the motion information or the intra prediction mode information,and the like.

In the scalable coding, after the picture of the base layer is encoded,the picture of the enhancement layer corresponding to the picture isencoded with reference to the encoding-related information of the baselayer. In other words, after the base layer is encoded, the obtainedencoding-related information of the base layer is supplied andappropriately used for encoding of the enhancement layer. The decodingis also performed in a similar procedure.

However, in the method of the related art, there was no method ofcontrolling an area serving as a reference destination of theencoding-related information in the encoding and decoding of theenhancement layer as described above. In other words, for example, evenwhen the encoding-related information differed by area, the entirepicture of the base layer was consistently used as a reference target.For this reason, since even an area that need not be used as thereference destination apparently in the picture of the base layer isused as the reference target, the number of memory accesses and the likeunnecessarily increase, and thus the workload of the encoding anddecoding of the enhancement layer was likely to unnecessarily increase.

Further, even in the scalable coding, by removing a processingdependence relation between areas such as the slice or the tile asdescribed above, it is possible to perform a process of each areaindependently and thus perform processes of the areas in parallel. Inother words, in this case, it is possible to sequentially perform theencoding and decoding of the base layer and the encoding and decoding ofthe enhancement layer for each area.

However, when the encoding-related information of the base layer isreferred to in the encoding and decoding of the enhancement layer, inthe method of the related art, the entire picture is used as thereference target, and thus the dependence relation with another areaoccurs. Thus, it was likely to be difficult to perform the processes ofthe areas in parallel.

<Limitation of Reference Target>

In this regard, in the encoding and decoding of the enhancement layer,an area serving as the reference target of the encoding-relatedinformation of another layer (for example, the base layer or anotherenhancement layer) is controlled For example, an area in whichencoding-related information is referred to is limited to some areas ofa picture of another layer.

FIG. 8 is a diagram illustrating an example of an aspect of limiting thereference target. In the case of FIG. 8 , only a tile indicated by amesh pattern of the base layer is designated as the reference target ofthe encoding-related information. In this case, the encoding-relatedinformation of the other areas (the areas of the white background) isneither included as the reference target nor read from a memory,regarding the encoding and decoding of the enhancement layer. Therefore,an increase in the workload of the encoding and decoding of theenhancement layer is suppressed accordingly.

The limiting method is arbitrary, but an area in which reference toencoding-related information of another layer is permitted may bedesignated. Further, for example, an area in which reference toencoding-related information for another layer is prohibited may bedesignated. Furthermore, for example, an area in which encoding-relatedinformation of another layer is referred to may be designated.

Since an area serving as a processing unit of encoding or decoding suchas a tile or a slice is used as a reference target control unit ofencoding-related information, it is possible to reduce the dependencerelation between the areas, and thus it is possible to more easilyperform processes independently in parallel.

<Specific Example of Area Control>

A more specific example of such control will be described.

For example, as in the example of FIG. 8 , in the event of the encodingof the base layer, a picture is divided into tiles, and control isperformed such that encoding-related information can be referred to inonly a few of the tiles. In this case, for example, reference toencoding-related information is permitted for those few tiles. Forexample, in the encoding of the base layer, control informationdesignating a tile in which reference to encoding-related information ispermitted is generated and supplied for the encoding of the enhancementlayer.

The encoding of the enhancement layer is executed according to thecontrol information. In other words, only encoding-related informationof a tile permitted by the control information can be referred to,regarding the encoding of the enhancement layer.

Further, regarding the encoding of the base layer, a setting method ofsetting an area in which reference to encoding-related information ispermitted is arbitrary. For example, an area in which reference toencoding-related information is permitted may be designated by the user,an application, or the like, or an area in which reference toencoding-related information is permitted may be decided in advance.

For example, when there is an area in which reference is apparentlyunnecessary such as a letter box at a common position of pictures of amoving image, the area may be excluded from “an area in which referenceto encoding-related information is permitted,” that is, other areas maybe designated as “an area in which reference to encoding-relatedinformation is permitted” in advance before the pictures of the movingimage data are encoded.

Further, for example, the user may designate “an area in which referenceto encoding-related information is permitted” of each picture, or theuser may designate a feature of an image, and an application or the likemay designate an area having the designated feature in each picture as“an area in which reference to encoding-related information ispermitted.” Furthermore, an application or the like may perform areadivision (for example, tile division, slice division, or the like) sothat an area including a certain feature (or a feature designated by theuser) is formed in each picture.

For example, in the encoding of the base layer, an input image isassumed to be an image including a person (A of FIG. 9 ). An applicationperforms a face recognition process on the image, and detects a partialarea including a face of a person (B of FIG. 9 ). Then, the applicationperforms tile division on the picture so that the partial area is set asone of tiles (C of FIG. 9 ). Then, the application designates the tile(that is, the detected partial area) including the face of the person as“an area in which reference to encoding-related information ispermitted” (a tile of a mesh pattern in D of FIG. 9 ).

As described above, the area division (forming of tiles or slices) maybe performed in a state in which the encoding-related information isrecognized to be referred to by the encoding of the enhancement layer.As a result, “the number of areas in which reference to encoding-relatedinformation is permitted” can be reduced. In other words, in theencoding of the enhancement layer, since it is possible to furthernarrow the range of the base layer to be referred to, it is possible tosuppress an increase in workload.

Further, control of an area in which encoding-related information isreferred to may be performed in units larger than at least areas (tiles,slices, or the like) as described above. For example, the control may beperformed in units of pictures. Further, for example, the control may beperformed in units of sequences. Furthermore, the control may beperformed in units of moving image data. Moreover, the controlinformation may be prepared in advance.

The example in which “an area in which reference to encoding-relatedinformation is permitted” is designated has been described above, butthe control method is not limited to this example, and, for example, “anarea in which reference to encoding-related information is prohibited”may be adversely designated. In this case, tiles other than a few tilesin which reference is prohibited are used as the reference target.

In this case, for example, in the encoding of the base layer, it isdesirable to generate control information designating a tile in whichreference to encoding-related information is prohibited and supply thecontrol information for the encoding of the enhancement layer.

The encoding of the enhancement layer is executed according to thecontrol information, similarly to the case in which reference ispermitted. In other words, only encoding-related information of titlesother than the tiles prohibited by the control information can bereferred to, regarding the encoding of the enhancement layer.

Of course, in this case, a setting method is arbitrary, similarly to thecase in which reference is permitted. Further, the number of areas inwhich reference to encoding-related information of the base layer ispermitted (or prohibited) may be one or several.

As described above, regardless of whether reference to encoding-relatedinformation is permitted or prohibited, in the encoding of theenhancement layer, it is arbitrary whether or not a picture is dividedinto tiles (or slices). Further, how to perform division is alsoarbitrary. Even if the enhancement layer is encoded in units of areassuch as tiles or slices, encoding of each area is performed based on thecontrol information. In other words, only encoding-related informationof a tile (or a slice) (other than a prohibited tile (or slice))permitted by the control information can be referred to in encoding ofall areas.

As described above, when the enhancement layer is encoded in units ofareas such as tiles or slices, an area in which reference toencoding-related information is permitted (or prohibited) may be set foreach area of the enhancement layer. In other words, an area in whichreference to encoding-related information of the base layer is permitted(or prohibited) may not be the same in each area of the enhancementlayer.

For example, the control information may be information (for example, acorrespondence table) in which the areas of the enhancement layer andthe areas of the base layer are associated (synchronized). In this case,only the encoding-related information of the areas of the base layerassociated by the correspondence table can be referred to in encoding ofthe areas of the enhancement layer.

It is possible to perform more appropriate control by controlling thereference destination of the encoding-related information for each areaof the enhancement layer as described above. Therefore, it is possibleto suppress an increase in encoding or decoding workload. Further, it ispossible to reduce the dependence relation between the areas.

For example, the area of the enhancement layer may be permitted to referto encoding-related information of different areas of the base layer asillustrated in FIG. 10. In the case of the example of FIG. 10 , thereference destination of the encoding-related information of the baselayer in encoding of a tile E₀ of the enhancement layer is limited to atile B₀ of the base layer. The reference destination of theencoding-related information of the base layer in encoding of a tile E₁of the enhancement layer is limited to a tile B₁ of the base layer. Thereference destination of the encoding-related information of the baselayer in encoding of a tile E₂ of the enhancement layer is limited to atile B₂ of the base layer. The reference destination of theencoding-related information of the base layer in encoding of a tile E₃of the enhancement layer is limited to a tile B₃ of the base layer.

Since the areas of the enhancement layer are permitted to refer to theencoding-related information of the different areas of the base layer asin the example of FIG. 10 , it is possible to reduce the dependencerelation between the areas and perform the parallel process more easilyas illustrated in FIG. 11 .

In the case of an example of FIG. 11 , a first CPU #0 performs encodingon tiles #0 of respective frames in the order of a tile #0 (B₀_0) of thebase layer of a frame #0, a tile #0 (E₀_0) of the enhancement layer ofthe frame #0, a tile #0 (B₀_1) of the base layer of a frame #1, a tile#0 (E₀_1) of the enhancement layer of the frame #1, a tile #0 (B₀_2) ofthe base layer of a frame #2, and a tile #0 (E₀_2) of the enhancementlayer of the frame #2.

In parallel to this, a second CPU #1 performs encoding on tiles #1 ofrespective frames in the order of a tile #1 (B₁_0) of the base layer ofthe frame #0, a tile #1 (E₁_0) of the enhancement layer of the frame #0,a tile #1 (B₁_1) of the base layer of the frame #1, a tile #1 (E₁_1) ofthe enhancement layer of the frame #1, a tile #1 (B₁_2) of the baselayer of the frame #2, and a tile #1 (E₁_2) of the enhancement layer ofthe frame #2.

Further, in parallel to the above processes, a third CPU #2 performsencoding on tiles #2 of respective frames in the order of a tile #2(B₂_0) of the base layer of the frame #0, a tile #2 (E₂_0) of theenhancement layer of the frame #0, a tile #2 (B₂_1) of the base layer ofthe frame #1, a tile #2 (E₂_1) of the enhancement layer of the frame #1,a tile #2 (B₂_2) of the base layer of the frame #2, and a tile #2 (E₂_2)of the enhancement layer of the frame #2.

Further, in parallel to the above processes, a fourth CPU #3 performsencoding on tiles #2 of respective frames in the order of a tile #2(B₃_0) of the base layer of the frame #0, a tile #3 (E₃_0) of theenhancement layer of the frame #0, a tile #3 (B₃_1) of the base layer ofthe frame #1, a tile #3 (E₃_1) of the enhancement layer of the frame #1,a tile #3 (B₃_2) of the base layer of the frame #2, and a tile #3 (E₃_2)of the enhancement layer of the frame #2.

The designation of an area (a tile, a slice, or the like) of the baselayer in the control information may be performed based on a position(for example, an offset value from the head) of data of each areaincluded in encoded data (bitstream) or may be performed based on anidentification number allocated to each area of the base layer.

For example, as illustrated in FIG. 12 , an identification number may beallocated to each area in the raster scan order, and an area in whichreference to encoding-related information is permitted or prohibited maybe designated using the identification number. Of course, a method ofallocating the identification number is arbitrary, and the raster scanorder is an example.

The above example has been described in connection with the case ofencoding but is similarly applied to the case of decoding.

<Transmission of Control Information>

The control information used to control reference to theencoding-related information may be transmitted from an encoding side toa decoding side. As the control information is transmitted to thedecoding side, the control information can be used in decoding. In otherwords, similarly to the case of encoding, decoding workload can bereduced. In this case, the control information may be specified, forexample, a picture parameter set (PPS) or a slice header. Of course, thecontrol information can be transmitted by an arbitrary method. Forexample, the control information may be specified in a sequenceparameter set, a video parameter set, or the like. Further, the controlinformation may be transmitted as data separate from encoded data ofimage data.

FIGS. 13 and 14 illustrate an example of syntax of the picture parameterset of the enhancement layer when the control information is transmittedthrough the picture parameter set.

In the case of this example, as illustrated in FIG. 13 ,tile_setting_from_ref_layer_flag is transmitted as informationindicating whether or not area division of a current layer (that is, theenhancement layer) serving as the processing target is similar to areadivision of another layer (that is, the base layer). When a valuethereof is 1, it indicates that a method of the area division (forexample, the tile division) in the enhancement layer is similar to thatof the base layer.

For example, when the area division of the enhancement layer is similarto the area division of the base layer, it is possible to detect thearea division of the enhancement layer with reference to the areadivision of the base layer information in the decoding of theenhancement layer, and thus it is unnecessary to transmit information(for example, num_tile_columns_minus1, num_tile_rows_minus1,uniform_spacing_flag, and the like in FIG. 13 ) related to the areadivision of the enhancement layer. Therefore, it is possible to suppressa decrease in encoding efficiency.

Further, as illustrated in FIG. 14 ,inter_layer_tile_prediction_restriction_flag is transmitted asinformation indicating whether or not to control an area in whichencoding-related information is referred to. When a value thereof is 1,the control information used to control reference to encoding-relatedinformation is transmitted (second to ninth lines from the top in FIG.14 ). In the case of the example of FIG. 14 , the enhancement layer isencoded in units of areas, and the control information used to controlan area of the base layer in which encoding-related information isreferred to is transmitted for each area of the enhancement layer.

Since the information indicating whether or not to control an area inwhich encoding-related information is referred to is transmitted asdescribed above, when an area in which encoding-related information isreferred to is not controlled, transmission of the control informationcan be omitted (the control information can be transmitted only when anarea in which encoding-related information is referred to iscontrolled). Therefore, it is possible to suppress a decrease inencoding efficiency.

In the case of the example of FIG. 14 , a current area serving as theprocessing target of the enhancement layer is designated by a position(ij) in a horizontal direction and a vertical direction in the areaarray. Further, the number (num_ref_tiles_minus1) of areas of the baselayer serving as the reference destination and the area thereof aredesignated for each area. Furthermore, the area of the base layerserving as the reference destination is designated by an identificationnumber (ref tile[k]). The identification number is allocated to eacharea of the base layer in the raster scan order as in the example ofFIG. 12 .

The current area of the enhancement layer and the area of the base layerserving as the reference destination can be designated by an arbitrarymethod other than the above-mentioned methods. For example, the currentarea of the enhancement layer may be designated using an identificationnumber. For example, the area of the base layer serving as the referencedestination may be designated by a position (i,j) in the horizontaldirection and the vertical direction in the area array or may bedesignated by information (for example, an offset value from the top)indicating a position of area data in encoded data.

FIGS. 15 to 17 illustrate an example of syntax of the slice header ofthe enhancement layer when the control information is transmittedthrough the slice header. As illustrated in FIGS. 15 to 17 , in the caseof the slice header, the control information is transmitted by a methodsimilar to that in the case of the picture parameter set described withreference to FIGS. 13 and 14 .

In the example of FIGS. 13 to 17 , an example in which a tile is used asan area has been described, but what have been described above can besimilarly applied to a slice used as an area.

Further, as described above, the encoding-related information includestexture information such as a decoded image or syntax information suchas the motion information or intra prediction mode information, forexample. In other words, for example, as inter-layer prediction in whichprediction is performed with reference to information of another layer,there are inter-layer texture prediction in which texture informationsuch as decoded image information of the base layer is used forprediction and inter-layer syntax prediction in which syntax informationsuch as the motion information and the intra prediction mode informationof the base layer is used for prediction. In the present technology,control of the reference destination of the encoding-related informationmay be independently performed in each prediction process. In otherwords, for example, a reference destination area of the textureinformation and a reference destination area of a syntax area may beindependently designated.

2. First Embodiment

<Image Encoding Device>

Next, a device implementing the present technology and a method thereofwill be described. FIG. 18 is a diagram illustrating an image encodingdevice as an example of an image processing device to which the presenttechnology is applied. An image encoding device 100 illustrated in FIG.18 is a device that performs layered image encoding. As illustrated inFIG. 18 , the image encoding device 100 includes a base layer imageencoding section 101, an enhancement layer image encoding section 102,and a multiplexing unit 103.

The base layer image encoding section 101 encodes a base layer image,and generates a base layer image encoded stream. The enhancement layerimage encoding section 102 encodes an enhancement layer image, andgenerates an enhancement layer image encoded stream. The multiplexingunit 103 multiplexes the base layer image encoded stream generated inthe base layer image encoding section 101 and the enhancement layerimage encoded stream generated in the enhancement layer image encodingsection 102, and generates a layered image encoded stream. Themultiplexing unit 103 transmits the generated layered image encodedstream to the decoding side.

In encoding of the base layer image, the base layer image encodingsection 101 performs the area division such as the tile division or theslice division on the current picture, and performs the encoding foreach area (a tile, a slice, or the like). The base layer image encodingsection 101 supplies the encoding-related information of the base layerobtained in the encoding to the enhancement layer image encoding section102.

In encoding of the enhancement layer image, the enhancement layer imageencoding section 102 performs the area division such as the tiledivision or the slice division on the current picture, and performs theencoding for each area (a tile, a slice, or the like). In this event,the enhancement layer image encoding section 102 controls an areaserving as the reference destination of the encoding-related informationof the base layer. More specifically, the enhancement layer imageencoding section 102 associates the areas of the enhancement layer withthe areas of the base layer serving as the reference destination of theencoding-related information, and generates the control informationindicating the correspondence relation thereof.

The enhancement layer image encoding section 102 appropriately refers tothe encoding-related information of the base layer according to thecontrol of the control information, and encodes the enhancement layerimage. The enhancement layer image encoding section 102 transmits thecontrol information to the decoding side (as the layered image encodedstream) through the multiplexing unit 103.

<Base Layer Image Encoding Section>

FIG. 19 is a block diagram illustrating an example of a mainconfiguration of the base layer image encoding section 101 of FIG. 18 .As illustrated in FIG. 19 , the base layer image encoding section 101has an A/D converting section 111, a screen reordering buffer 112, anoperation section 113, an orthogonal transform section 114, aquantization section 115, a lossless encoding section 116, anaccumulation buffer 117, an inverse quantization section 118, and aninverse orthogonal transform section 119. In addition, the base layerimage encoding section 103 has an operation section 120, a loop filter121, a frame memory 122, a selecting section 123, an intra predictionsection 124, an inter prediction section 125, a predictive imageselecting section 126, and a rate control section 127. Further, the baselayer image encoding section 101 has a base layer area division settingsection.

The A/D converting section 111 performs A/D conversion on input imagedata (the base layer image information), and supplies the convertedimage data (digital data) to be stored in the screen reordering buffer112. The screen reordering buffer 112 reorders images of frames storedin a display order in a frame order for encoding according to a Group OfPictures (GOP), and supplies the images in which the frame order isreordered to the operation section 113. The screen reordering buffer 112also supplies the images in which the frame order is reordered to theintra prediction section 124 and the inter prediction section 125.

The operation section 113 subtracts a predictive image supplied from theintra prediction section 124 or the inter prediction section 125 via thepredictive image selecting section 126 from an image read from thescreen reordering buffer 112, and outputs differential informationthereof to the orthogonal transform section 114. For example, in thecase of an image that has been subjected to intra coding, the operationsection 113 subtracts the predictive image supplied from the intraprediction section 124 from the image read from the screen reorderingbuffer 112. Further, for example, in the case of an image that has beensubjected to inter coding, the operation section 113 subtracts thepredictive image supplied from the inter prediction section 125 from theimage read from the screen reordering buffer 112.

The orthogonal transform section 114 performs an orthogonal transformsuch as a discrete cosine transform or a Karhunen-Loève Transform on thedifferential information supplied from the operation section 113. Theorthogonal transform section 114 supplies transform coefficients to thequantization section 115.

The quantization section 115 quantizes the transform coefficientssupplied from the orthogonal transform section 114. The quantizationsection 115 sets a quantization parameter based on information relatedto a target value of a coding amount supplied from the rate controlsection 127, and performs the quantizing. The quantization section 115supplies the quantized transform coefficients to the lossless encodingsection 116.

The lossless encoding section 116 encodes the transform coefficientsquantized in the quantization section 115 according to an arbitraryencoding scheme. Since coefficient data is quantized under control ofthe rate control section 127, the coding amount becomes a target value(or approaches a target value) set by the rate control section 127.

The lossless encoding section 116 acquires information indicating anintra prediction mode or the like from the intra prediction section 124,and acquires information indicating an inter prediction mode,differential motion vector information, or the like from the interprediction section 125. Further, the lossless encoding section 116appropriately generates an NAL unit of the base layer including asequence parameter set (SPS), a picture parameter set (PPS), and thelike.

The lossless encoding section 116 encodes information (which is alsoreferred to as “base layer area division information”) related to area(for example, a tile, a slice, or the like) division of the base layerset by the base layer area division setting section.

The lossless encoding section 116 encodes various kinds of informationaccording to an arbitrary encoding scheme, and sets (multiplexes) theencoded information as part of encoded data (also referred to as an“encoded stream”). The lossless encoding section 116 supplies theencoded data obtained by the encoding to be accumulated in theaccumulation buffer 117.

Examples of the encoding scheme of the lossless encoding section 116include variable length coding and arithmetic coding. As the variablelength coding, for example, there is Context-Adaptive Variable LengthCoding (CAVLC) defined in the H.264/AVC scheme. As the arithmeticcoding, for example, there is Context-Adaptive Binary Arithmetic Coding(CABAC).

The accumulation buffer 117 temporarily holds the encoded data (baselayer encoded data) supplied from the lossless encoding section 116. Theaccumulation buffer 117 outputs the held base layer encoded data to arecording device (recording medium), a transmission path, or the like(not illustrated) at a subsequent stage under certain timing. In otherwords, the accumulation buffer 117 serves as a transmitting section thattransmits the encoded data as well.

The transform coefficients quantized by the quantization section 115 arealso supplied to the inverse quantization section 118. The inversequantization section 118 inversely quantizes the quantized transformcoefficients according to a method corresponding to the quantizationperformed by the quantization section 115. The inverse quantizationsection 118 supplies the obtained transform coefficients to the inverseorthogonal transform section 119.

The inverse orthogonal transform section 119 performs an inverseorthogonal transform on the transform coefficients supplied from theinverse quantization section 118 according to a method corresponding tothe orthogonal transform process performed by the orthogonal transformsection 114. An output (restored differential information) that has beensubjected to the inverse orthogonal transform is supplied to theoperation section 120.

The operation section 120 obtains a locally decoded image (a decodedimage) by adding the predictive image supplied from the intra predictionsection 124 or the inter prediction section 125 via the predictive imageselecting section 126 to the restored differential information servingas an inverse orthogonal transform result supplied from the inverseorthogonal transform section 119. The decoded image is supplied to theloop filter 121 or the frame memory 122.

The loop filter 121 includes a deblock filter, an adaptive loop filter,or the like, and appropriately performs a filter process on thereconstructed image supplied from the operation section 120. Forexample, the loop filter 121 performs the deblock filter process on thereconstructed image, and removes block distortion of the reconstructedimage. Further, for example, the loop filter 121 improves the imagequality by performing the loop filter process on the deblock filterprocess result (the reconstructed image from which the block distortionhas been removed) using a Wiener filter. The loop filter 121 suppliesthe filter process result (hereinafter referred to as a “decoded image”)to the frame memory 122.

The loop filter 121 may further perform any other arbitrary filterprocess on the reconstructed image. The loop filter 121 may supplyinformation used in the filter process such as a filter coefficient tothe lossless encoding section 116 as necessary so that the informationcan be encoded.

The frame memory 122 stores the supplied decoded image, and supplies thestored decoded image to the selecting section 123 as a reference imageunder certain timing.

More specifically, the frame memory 122 stores the reconstructed imagesupplied from the operation section 120 and the decoded image suppliedfrom the loop filter 121. The frame memory 122 supplies the storedreconstructed image to the intra prediction section 124 via theselecting section 123 under certain timing or based on an externalrequest, for example, from the intra prediction section 124. Further,the frame memory 122 supplies the stored decoded image to the interprediction section 125 via the selecting section 123 under certaintiming or based on an external request, for example, from the interprediction section 125.

The selecting section 123 selects a supply destination of the referenceimage supplied from the frame memory 122. For example, in the case ofthe intra prediction, the selecting section 123 supplies the referenceimage (a pixel value of a current picture) supplied from the framememory 122 to the intra prediction section 124. Further, for example, inthe case of the inter prediction, the selecting section 123 supplies thereference image supplied from the frame memory 122 to the interprediction section 125.

The intra prediction section 124 performs the prediction process on thecurrent picture that is an image of a processing target frame, andgenerates a prediction image. The intra prediction section 124 performsthe prediction process in units of certain blocks (using a block as aprocessing unit). In other words, the intra prediction section 124generates a prediction image of a current block serving as theprocessing target in the current picture. In this event, the intraprediction section 124 performs the prediction process (intra-screenprediction (which is also referred to as “intra prediction”)) using areconstructed image supplied as the reference image from the framememory 122 via the selecting section 123. In other words, the intraprediction section 124 generates the prediction image using pixel valuesneighboring the current block which are included in the reconstructedimage. The neighboring pixel value used for the intra prediction is apixel value of a pixel which has been previously processed in thecurrent picture. As the intra prediction (that is, a method ofgenerating the prediction image), a plurality of methods (which are alsoreferred to as “intra prediction modes”) is prepared as candidates inadvance. The intra prediction section 124 performs the intra predictionin the plurality of intra prediction modes prepared in advance.

The intra prediction section 124 generates predictive images in all theintra prediction modes serving as the candidates, evaluates costfunction values of the predictive images using the input image suppliedfrom the screen reordering buffer 112, and selects an optimal mode. Whenthe optimal intra prediction mode is selected, the intra predictionsection 124 supplies the predictive image generated in the optimal modeto the predictive image selecting section 126.

As described above, the intra prediction section 124 appropriatelysupplies, for example, the intra prediction mode information indicatingthe employed intra prediction mode to the lossless encoding section 116so that the information is encoded.

The inter prediction section 125 performs the prediction process on thecurrent picture, and generates a prediction image. The inter predictionsection 125 performs the prediction process in units of certain blocks(using a block as a processing unit). In other words, the interprediction section 125 generates a prediction image of a current blockserving as the processing target in the current picture. In this event,the inter prediction section 125 performs the prediction process usingimage data of the input image supplied from the screen reordering buffer112 and image data of a decoded image supplied as the reference imagefrom the frame memory 122. The decoded image is an image (anotherpicture that is not the current picture) of a frame which has beenprocessed before the current picture. In other words, the interprediction section 125 performs the prediction process (inter-screenprediction (which is also referred to as “inter prediction”) ofgenerating the prediction image using an image of another picture.

The inter prediction includes motion prediction and motion compensation.More specifically, the inter prediction section 125 performs the motionprediction on the current block using the input image and the referenceimage, and detects a motion vector. Then, the inter prediction section125 performs motion compensation process using the reference imageaccording to the detected motion vector, and generates the predictionimage (inter prediction image information) of the current block. As theinter prediction (that is, a method of generating the prediction image),a plurality of methods (which are also referred to as “inter predictionmodes”) is prepared as candidates in advance. The inter predictionsection 125 performs the inter prediction in the plurality of interprediction modes prepared in advance.

The inter prediction section 125 generates predictive images in all theinter prediction modes serving as a candidate. The inter predictionsection 125 evaluates cost function values of the predictive imagesusing the input image supplied from the screen reordering buffer 112,information of the generated differential motion vector, and the like,and selects an optimal mode. When the optimal inter prediction mode isselected, the inter prediction section 125 supplies the predictive imagegenerated in the optimal mode to the predictive image selecting section126.

The inter prediction section 125 supplies information indicating theemployed inter prediction mode, information necessary for performingprocessing in the inter prediction mode in decoding of the encoded data,and the like to the lossless encoding section 116 so that theinformation is encoded. For example, as the necessary information, thereis information of a generated differential motion vector, and asprediction motion vector information, there is a flag indicating anindex of a prediction motion vector.

The predictive image selecting section 126 selects a supply source ofthe prediction image to be supplied to the operation section 113 and theoperation section 120. For example, in the case of the intra coding, thepredictive image selecting section 126 selects the intra predictionsection 124 as the supply source of the predictive image, and suppliesthe predictive image supplied from the intra prediction section 124 tothe operation section 113 and the operation section 120. For example, inthe case of the inter coding, the predictive image selecting section 126selects the inter prediction section 125 as the supply source of thepredictive image, and supplies the predictive image supplied from theinter prediction section 125 to the operation section 113 and theoperation section 120.

The rate control section 127 controls a rate of a quantization operationof the quantization section 115 based on the coding amount of theencoded data accumulated in the accumulation buffer 117 such that nooverflow or underflow occurs.

The base layer area division setting section 128 sets the area division(for example, a tile, a slice, or the like) to the picture of the baselayer. The base layer area division setting section 128 supplies thissetting to the respective sections of the base layer image encodingsection 101 as the base layer area division information. The respectivesections of the base layer image encoding section 101 execute processingfor each area indicated by the base layer area division information.Encoding of each area is independently processed. Therefore, forexample, it is possible to process encoding of the areas in parallelusing a plurality of CPUs.

The base layer image encoding section 101 performs encoding withoutreferring to another layer. In other words, the intra prediction section124 and the inter prediction section 125 do not refer to theencoding-related information of the other layers.

The frame memory 122 supplies the image data of the decoded image of thebase layer stored therein to the enhancement layer image encodingsection 102 as the encoding-related information of the base layer.

Similarly, the intra prediction section 124 supplies the intraprediction mode information and the like to the enhancement layer imageencoding section 102 as the encoding-related information of the baselayer.

Similarly, the inter prediction section 125 supplies the motioninformation and the like to the enhancement layer image encoding section102 as the encoding-related information of the base layer.

Further, the base layer area division setting section 128 supplies thebase layer area division information to the enhancement layer imageencoding section 102 as well.

<Enhancement Layer Image Encoding Section>

FIG. 20 is a block diagram illustrating an example of a mainconfiguration of the enhancement layer image encoding section 102 ofFIG. 18 . As illustrated in FIG. 20 , the enhancement layer imageencoding section 102 has basically a configuration similar to that ofthe base layer image encoding section 101 of FIG. 19 .

In other words, the enhancement layer image encoding section 102includes an A/D converting section 131, a screen reordering buffer 132,an operation section 133, an orthogonal transform section 134, aquantization section 135, a lossless encoding section 136, anaccumulation buffer 137, an inverse quantization section 138, and aninverse orthogonal transform section 139 as illustrated in FIG. 20 . Theenhancement layer image encoding section 102 further includes anoperation section 140, a loop filter 141, a frame memory 142, aselecting section 143, an intra prediction section 144, an interprediction section 145, a prediction image selecting section 146, and arate control section 147.

The A/D converting section 131 to the rate control section 147correspond to the A/D converting section 111 to the rate control section127 of FIG. 19 , and perform processing similar to that performed by thecorresponding processing sections. However, the respective sections ofthe enhancement layer image encoding section 102 perform the process ofencoding the enhancement layer image information rather than the baselayer. Therefore, the description of the A/D converting section 111 tothe rate control section 127 of FIG. 19 can be applied as a descriptionof processing of the A/D converting section 131 to the rate controlsection 147, but in this case, it is necessary to set data of theenhancement layer as data to be processed instead of data of the baselayer. Further, it is necessary to appropriately interpret theprocessing sections of data input source and data output destination asthe corresponding processing sections of the A/D converting section 131to the rate control section 147.

Further, the enhancement layer image encoding section 102 does notinclude the base layer area division setting section 128 but includes anarea synchronization section 148 and an up-sampling section 149.

The area synchronization section 148 sets the area division (forexample, a tile, a slice, or the like) to the picture of the enhancementlayer. The area synchronization section 148 supplies this setting to therespective sections of the enhancement layer image encoding section 102as the enhancement layer area division information.

Further, the area synchronization section 148 controls an area in whichthe encoding-related information of the base layer is referred to,regarding the encoding of the enhancement layer. For example, the areasynchronization section 148 generates the control information used tocontrol an area in which the encoding-related information of the baselayer is referred to, and control the intra prediction section 144 orthe inter prediction section 145 according to the control information.In other words, the area synchronization section 148 controls the areaof the base layer in which the encoding-related information is referredto when the intra prediction section 144 or the inter prediction section145 performs the inter-layer prediction.

Further, the area synchronization section 148 supplies the controlinformation to the lossless encoding section 136 so that the controlinformation is encoded and transmitted to the decoding side.

The enhancement layer image encoding section 102 performs encoding withreference to the encoding-related information of another layer (forexample, the base layer).

The area synchronization section 148 acquires the base layer areadivision information supplied from the base layer image encoding section101. The area synchronization section 148 generates the controlinformation using the base layer area division information.

The up-sampling section 149 acquires the encoding-related information ofthe base layer supplied from the base layer image encoding section 101.For example, the up-sampling section 149 acquires the textureinformation such as the decoded image (which is also referred to as a“decoded base layer image”) of the base layer as the encoding-relatedinformation. For example, when the inter layer syntax prediction process(the inter layer prediction) is performed, the up-sampling section 149also acquires the syntax information such as the motion information andthe intra prediction mode information of the base layer as theencoding-related information.

The up-sampling section 149 performs the up-sampling process on theacquired encoding-related information of the base layer. In the scalablecoding, layers differ in a value of a certain parameter (for example, aresolution or the like) with a scalability function. For this reason,the up-sampling section 149 performs the up-sampling process (performsthe scalable parameter conversion process) on the encoding-relatedinformation of the base layer so that the value of the parameter isconverted based on the enhancement layer. As the up-sampling process isperformed as described above, the encoding-related information of thebase layer can be used in encoding of the enhancement layer.

The up-sampling section 149 supplies the encoding-related information ofthe base layer that has undergone the up-sampling process to be storedin the frame memory 142. For example, the encoding-related informationof the base layer is supplied to the intra prediction section 144 or theinter prediction section 145 as the reference image. The syntaxinformation is similarly supplied to the intra prediction section 144 orthe inter prediction section 145.

<Area Synchronization Section>

FIG. 21 is a block diagram illustrating an example of a mainconfiguration of the area synchronization section 148 of FIG. 20 .

As illustrated in FIG. 21 , the area synchronization section 148includes a base layer area division information buffer 171, anenhancement layer area division setting section 172, and an areasynchronization setting section 173.

The base layer area division information buffer 171 acquires and holdsthe base layer area division information supplied from the base layerimage encoding section 101. The base layer area division informationbuffer 171 supplies the base layer area division information being heldtherein to the area synchronization setting section 173 under certaintiming or according to an external request from the area synchronizationsetting section 173 or the like.

The enhancement layer area division setting section 172 sets the areadivision (for example, a tile, a slice, or the like) of the picture ofthe enhancement layer. An area division setting method is arbitrary. Forexample, the area division may be set by the user, the application, orthe like or may be decided in advance. The area division of theenhancement layer may be similar to or different from the area divisionof the base layer.

The enhancement layer area division setting section 172 supplies thissetting to the respective sections of the enhancement layer imageencoding section 102 as the enhancement layer area division information.The respective sections of the enhancement layer image encoding section102 execute processing for each area indicated by the enhancement layerarea division information. Encoding of each area is independentlyprocessed. Therefore, for example, it is possible to process encoding ofthe areas in parallel using a plurality of CPUs.

The enhancement layer area division setting section 172 supplies thegenerated enhancement layer area division information to the areasynchronization setting section 173 as well.

Further, the enhancement layer area division setting section 172supplies the generated enhancement layer area division information tothe lossless encoding section 136 so that the enhancement layer areadivision information is encoded and transmitted to the decoding side. Asa result, since the decoding side can perform decoding with reference tothis information, it is possible to reduce decoding workload.

The area synchronization setting section 173 performs area associationbetween layers using the supplied base layer area division informationand the enhancement layer division information. In other words, the areasynchronization setting section 173 sets an area in which theencoding-related information of the base layer is referred to in theevent of encoding to each area of the enhancement layer.

The area synchronization setting section 173 generates synchronizationarea information indicating this setting. Information of anyspecification can be used as the synchronization area information aslong as the information is used to control of the area of the base layerserving as the reference destination of the encoding-relatedinformation. For example, information used to associate the area of thebase layer serving the reference destination of the encoding-relatedinformation with each area of the enhancement layer may be used. Forexample, information of the syntax described in <1. Main description ofpresent technology> may be used.

The setting method is arbitrary. In other words, an area that isreferred to in the intra prediction section 144 or the inter predictionsection 145 is decided by an arbitrary method. For example, the area maybe set by the user, the application, or the like or may be decided inadvance.

The area synchronization setting section 173 specifies the area of thebase layer that is used as the reference destination of theencoding-related information in the current area serving as theprocessing target using the generated synchronization area information,generates synchronization address information indicating a position(address) of data of the area in data of the encoding-relatedinformation (for example, the texture information such as the referenceimage or the syntax information such as the motion information or theintra prediction mode information) that has undergone the up-samplingprocess and is stored in the frame memory 142, and supplies thesynchronization address information to the intra prediction section 144or the inter prediction section 145.

The intra prediction section 144 or the inter prediction section 145performs the inter-layer prediction according to the synchronizationaddress information, and thus it is possible to set only some areas ofthe picture of the base layer as the reference destination, and it ispossible to suppress an increase in the number of accesses to the framememory 142. In other words, as the area synchronization setting section173 performs this process, it is possible to suppress an increase in theencoding workload.

Further, the area synchronization setting section 173 supplies thegenerated synchronization area information to the lossless encodingsection 136 so that the synchronization area information is encoded andtransmitted to the decoding side. As a result, the decoding side canperform decoding with reference to the synchronization area information,and thus, regarding decoding, it is similarly possible to suppress anincrease in the number of accesses to the memory, and it is possible toreduce the decoding workload.

<Flow of Image Encoding Process>

Next, the flow of each process performed by the image encoding device100 will be described. First, an example of the flow of an imageencoding process will be described with reference to a flowchart of FIG.22 .

When the image encoding process starts, in step S101, the base layerimage encoding section 101 of the image encoding device 100 encodesimage data of the base layer.

In step S102, the enhancement layer image encoding section 102 encodesimage data of the enhancement layer.

In step S103, the multiplexing unit 103 multiplexes a base layer imageencoded stream generated in the process of step S101 and an enhancementlayer image encoded stream generated in the process of step S102 (thatis, the bitstreams of the respective layers), and generates a layeredimage encoded stream of one system.

When the process of step S103 ends, the image encoding device 100 endsthe image encoding process. One picture is processed through the imageencoding process. Therefore, the image encoding device 100 repeatedlyperforms the image encoding process on pictures of hierarchized movingimage data.

<Flow of Base Layer Encoding Process>

Next, an example of the flow of the base layer encoding process executedby the base layer image encoding section 101 in step S101 of FIG. 22will be described with reference to FIG. 23 .

When the base layer encoding process starts, in step S121, the baselayer area division setting section 128 of the base layer image encodingsection 101 decides the area division of the base layer by a certainmethod, and generates the base layer area division information. Further,the base layer area division setting section 128 supplies the base layerarea division information to the respective sections of the base layerimage encoding section 101.

In step S122, the base layer area division setting section 128 suppliesthe base layer area division information generated in step S121 to thelossless encoding section 116 so that the base layer area divisioninformation is transmitted.

The subsequent processes are executed for each of the areas set in stepS121. In other words, each process is executed using the area or acertain unit smaller than the area as a processing unit.

In step S123, the A/D converting section 111 performs A/D conversion onan image of each frame (picture) of an input moving image.

In step S124, the screen reordering buffer 112 stores the image that hasundergone the A/D conversion in step S123, and performs reordering froma display order to an encoding order on each picture.

In step S125, the intra prediction section 124 performs the intraprediction process of the intra prediction mode.

In step S126, the inter prediction section 125 performs the interprediction process in which the motion prediction, the motioncompensation, and the like are performed in the inter prediction mode.

In step S127, the prediction image selecting section 126 selects aprediction image based on a cost function value or the like. In otherwords, the prediction image selecting section 126 selects any one of theprediction image generated by the intra prediction of step S125 and theprediction image generated by the inter prediction of step S126.

In step S128, the operation section 113 calculates a difference betweenthe input image in which the frame order is reordered in the process ofstep S124 and the prediction image selected in the process of step S127.In other words, the operation section 113 generates image data of adifferential image between the input image and the prediction image. Anamount of the obtained image data of the differential image is reducedto be smaller than the original image data. Therefore, an amount of datacan be compressed to be smaller than when an image is encoded withoutchange.

In step S129, the orthogonal transform section 114 performs theorthogonal transform on the image data of the differential imagegenerated in the process of step S128.

In step S130, the quantization section 115 quantizes the orthogonaltransform coefficient obtained in the process of step S129 using thequantization parameter calculated by the rate control section 127.

In step S131, the inverse quantization section 118 inversely quantizesthe quantized coefficient (which is also referred to as a “quantizationcoefficient”) generated in the process of step S130 according tocharacteristics corresponding to characteristics of the quantizationsection 115.

In step S132, the inverse orthogonal transform section 119 performs theinverse orthogonal transform on the orthogonal transform coefficientobtained in the process of step S131.

In step S133, the operation section 120 generates image data of areconstructed image by adding the prediction image selected in theprocess of step S127 to the differential image restored in the processof step S132.

In step S134, the loop filter 121 performs the loop filter process onthe image data of the reconstructed image generated in the process ofstep S133. As a result, for example, block distortion of thereconstructed image is removed.

In step S135, the frame memory 122 stores data such as the decoded imageobtained in the process of step S134, the reconstructed image obtainedin the process of step S133, and the like.

In step S136, the lossless encoding section 116 encodes the quantizedcoefficients obtained in the process of step S130. In other words,lossless coding such as variable length coding or arithmetic coding isperformed on data corresponding to the differential image.

At this time, the lossless encoding section 116 encodes informationrelated to the prediction mode of the predictive image selected in theprocess of step S127, and adds the encoded information to the encodeddata obtained by encoding the differential image. In other words, thelossless encoding section 116 also encodes, for example, informationaccording to the optimal intra prediction mode information supplied fromthe intra prediction section 124 or the optimal inter prediction modesupplied from the inter prediction section 125, and adds the encodedinformation to the encoded data.

Further, the lossless encoding section 116 sets and encodes syntaxelements such as various null units, and adds the encoded syntaxelements to the encoded data.

In step S137, the accumulation buffer 117 accumulates the encoded dataobtained in the process of step S136. The encoded data accumulated inthe accumulation buffer 117 is appropriately read and transmitted to thedecoding side via a transmission path or a recording medium.

In step S138, the rate control section 127 controls the quantizationoperation of the quantization section 115 based on the coding amount(the generated coding amount) of the encoded data accumulated in theaccumulation buffer 117 in the process of step S137 so that no overflowor underflow occurs. Further, the rate control section 127 suppliesinformation related to the quantization parameter to the quantizationsection 115.

In step S139, the frame memory 122, the intra prediction section 124,the inter prediction section 125, and the base layer area divisionsetting section 128 supply the encoding-related information of the baselayer obtained in the above base layer encoding process for the encodingprocess of the enhancement layer.

When the process of step S139 ends, the base layer encoding processends, and the process returns to FIG. 22 .

<Flow of Enhancement Layer Encoding Process>

Next, an example of the flow of the enhancement layer encoding processexecuted by the enhancement layer image encoding section 102 in stepS102 of FIG. 22 will be described with reference to flowcharts of FIGS.24 and 25 .

When the enhancement layer encoding process starts, in step S151, thebase layer area division information buffer 171 of the enhancement layerimage encoding section 102 acquires the base layer area divisioninformation that is generated in the base layer encoding process andsupplied.

In step S152, the up-sampling section 149 acquires the decoded baselayer image (that is, the texture information) that is generated in thebase layer encoding process and supplied as the encoding-relatedinformation. When the inter-layer syntax prediction is performed, theup-sampling section 149 also acquires the syntax information that isgenerated in the base layer encoding process and supplied as theencoding-related information.

In step S153, the up-sampling section 149 performs the up-samplingprocess on the encoding-related information (for example, the decodedbase layer image) of the base layer acquired in step S152.

In step S154, the frame memory 142 stores the encoding-relatedinformation (for example, the decoded base layer image) of the baselayer that has undergone the up-sampling process through the process ofstep S153.

In step S155, the enhancement layer area division setting section 172decides the area division of the enhancement layer by a certain method,and generates the enhancement layer area division information. Further,the enhancement layer area division setting section 172 supplies theenhancement layer area division information to the respective sectionsof the enhancement layer image encoding section 102.

In step S156, the area synchronization setting section 173 generates thesynchronization area information by a certain method using the baselayer area division information acquired in step S151 and theenhancement layer area division information generated in step S155. Inother words, the area synchronization setting section 173 sets the areaof the base layer serving as the reference destination of theencoding-related information to each area of the enhancement layer.

In step S157, the area synchronization setting section 173 generates thesynchronization address information indicating data of the area of thebase layer serving as the reference destination of the encoding-relatedinformation using the synchronization area information generated in theprocess of step S156.

In step S158, the area synchronization setting section 173 supplies thesynchronization area information generated in the process of step S156to the lossless encoding section 136 so that the synchronization areainformation is transmitted. Further, the enhancement layer area divisionsetting section 172 supplies the enhancement layer area divisioninformation generated in the process of step S155 to the losslessencoding section 136 so that the enhancement layer area divisioninformation is transmitted.

When the process of step S158 ends, the process proceeds to step S161 ofFIG. 25 .

The subsequent processes are executed for each of the areas set in stepS155. In other words, each process is executed using the area or acertain unit smaller than the area as a processing unit.

The process of step S161 to step S176 of FIG. 25 corresponds and isexecuted similarly to the process of step S123 to step S138 of FIG. 23 .

When the process of step S176 ends, the enhancement layer encodingprocess ends, and the process returns to FIG. 22 .

By executing the respective processes as described above, the imageencoding device 100 can reduce the number of memory accesses forreferring to the encoding-related information of another layer in theinter-layer prediction and thus suppress an increase in the encoding anddecoding workload.

3. Second Embodiment

<Image Decoding Device>

Next, decoding of encoded data encoded as described above will bedescribed. FIG. 26 is a block diagram illustrating an example of a mainconfiguration of an image decoding device that corresponds to the imageencoding device 100 of FIG. 18 as an example of an image processingdevice to which the present technology is applied.

An image decoding device 200 illustrated in FIG. 26 decodes the encodeddata generated by the image encoding device 100 by a decoding methodcorresponding to an encoding method thereof (that is, performs scalabledecoding on the encoded data that has undergone the scalable coding).

As illustrated in FIG. 26 , the image decoding device 200 includes ademultiplexing unit 201, a base layer image decoding section 202, and anenhancement layer image decoding section 203.

The demultiplexing unit 201 receives the layered image encoded stream inwhich the base layer image encoded stream and the enhancement layerimage encoded stream are multiplexed, which is transmitted from theencoding side, demultiplexes the scalable image encoded stream, andextracts the base layer image encoded stream and the enhancement layerimage encoded stream.

The base layer image decoding section 202 decodes the base layer imageencoded stream extracted by the demultiplexing unit 201, and obtains thebase layer image. In this event, the base layer image decoding section202 performs the decoding for each area (a tile, a slice, or the like)set in the encoding side based on the base layer area divisioninformation supplied from the encoding side.

The enhancement layer image decoding section 203 decodes the enhancementlayer image encoded stream extracted by the demultiplexing unit 201, andobtains the enhancement layer image. In this event, the enhancementlayer image decoding section 203 performs the decoding for each area (atile, a slice, or the like) set in the encoding side based on theenhancement layer area division information supplied from the encodingside.

Further, the enhancement layer image decoding section 203 performs theinter-layer prediction using the synchronization area informationserving as the control information that is supplied from the encodingside and used to control the area of the base layer serving as thereference destination of the encoding-related information of each areaof the enhancement layer. In other words, when the inter-layerprediction is performed in the decoding of the enhancement layer, theenhancement layer image decoding section 203 refers to theencoding-related information of the area of the base layer designated bythe synchronization area information.

<Base Layer Image Decoding Section>

FIG. 27 is a block diagram illustrating an example of a mainconfiguration of the base layer image decoding section 202 of FIG. 26 .As illustrated in FIG. 27 , the base layer image decoding section 202includes an accumulation buffer 211, a lossless decoding section 212, aninverse quantization section 213, an inverse orthogonal transformsection 214, an operation section 215, a loop filter 216, a screenreordering buffer 217, and a D/A conversion section 218. The base layerimage decoding section 202 further includes a frame memory 219, aselecting section 220, an intra prediction section 221, an interprediction section 222, and a prediction image selecting section 223.

The accumulation buffer 211 is a reception section that receives thetransmitted encoded data. The accumulation buffer 211 receives andaccumulates the transmitted encoded data, and supplies the encoded datato the lossless decoding section 212 under certain timing. Informationnecessary for decoding such as the prediction mode information is addedto the encoded data. The lossless decoding section 212 decodes theinformation that is supplied from the accumulation buffer 211 andencoded by the lossless encoding section 116 according to the decodingscheme corresponding to the encoding scheme. The lossless decodingsection 212 supplies quantized coefficient data of a differential imageobtained by the decoding to the inverse quantization section 213.

Further, the lossless decoding section 212 determines whether the intraprediction mode or the inter prediction mode is selected as an optimumprediction mode, and supplies information related to the optimumprediction mode to the mode determined to be selected, that is, theintra prediction section 221 or the inter prediction section 222. Inother words, for example, when the intra prediction mode is selected asthe optimum prediction mode at the encoding side, the informationrelated to the optimum prediction mode is supplied to the intraprediction section 221. Further, for example, when the inter predictionmode is selected as the optimum prediction mode at the encoding side,the information related to the optimum prediction mode is supplied tothe inter prediction section 222.

Further, the lossless decoding section 212, for example, suppliesinformation necessary for inverse quantization such as a quantizationmatrix or a quantization parameter to the inverse quantization section213.

Further, the lossless decoding section 212 supplies the base layer areadivision information supplied from the encoding side to the respectiveprocessing sections of the base layer image decoding section 202. Therespective sections of the base layer image decoding section 202 performprocessing for each area indicated by the base layer area divisioninformation. Decoding of each area is independently processed.Therefore, for example, it is possible to perform the decoding ofrespective areas in parallel using a plurality of CPUs.

The inverse quantization section 213 inversely quantizes the quantizedcoefficient data obtained through the decoding performed by the losslessdecoding section 212 according to a scheme corresponding to thequantization scheme of the quantization section 115. The inversequantization section 213 is a processing section similar to the inversequantization section 118. In other words, the description of the inversequantization section 213 can be applied to the inverse quantizationsection 118 as well. However, it is necessary to interpret data inputsource, data output destination and the like as each processing sectionof the base layer image decoding section 202.

The inverse quantization section 213 supplies the obtained coefficientdata to the inverse orthogonal transform section 214.

If necessary, the inverse orthogonal transform section 214 performs theinverse orthogonal transform on the orthogonal transform coefficientsupplied from the inverse quantization section 213 according to a schemecorresponding to the orthogonal transform scheme of the orthogonaltransform section 114. The inverse orthogonal transform section 214 is aprocessing section similar to the inverse orthogonal transform section119. In other words, the description of the inverse orthogonal transformsection 214 can be applied to the inverse orthogonal transform section119 as well. However, it is necessary to interpret data input source,data output destination and the like as each processing section of thebase layer image decoding section 202.

The image data of the differential image is restored through the inverseorthogonal transform process. The restored image data of thedifferential image corresponds to the image data of the differentialimage before the orthogonal transform is performed in the image encodingdevice. Hereinafter, the restored image data of the differential imageobtained by the inverse orthogonal transform process of the inverseorthogonal transform section 214 is referred to as “decoded residualdata.” The inverse orthogonal transform section 214 supplies the decodedresidual data to the operation section 215. Further, the operationsection 215 is supplied with the image data of the prediction image fromthe intra prediction section 221 or the inter prediction section 222 viathe prediction image selecting section 223.

The operation section 215 obtains the image data of the reconstructedimage in which the differential image and the prediction image are addedusing the decoded residual data and the image data of the predictionimage. The reconstructed image corresponds to the input image before theprediction image is subtracted by the operation section 113. Theoperation section 215 supplies the reconstructed image to the loopfilter 216.

The loop filter 216 generates a decoded image by appropriatelyperforming a loop filter process including a deblock filter process, anadaptive loop filter process, or the like on the supplied reconstructedimage. For example, the loop filter 216 removes block distortion byperforming the deblock filter process on the reconstructed image.Further, for example, the loop filter 216 improves the image quality byperforming the loop filter process on the deblock filter process result(the reconstructed image from which the block distortion has beenremoved) using a Wiener Filter.

A type of the filter process performed by the loop filter 216 isarbitrary, and a process other than the above-described filter processmay be performed. Further, the loop filter 216 may perform the filterprocess using the filter coefficient supplied from the image encodingdevice. Furthermore, the loop filter 216 may omit the filter process andmay output input data without performing the filter process.

The loop filter 216 supplies the decoded image (or the reconstructedimage) serving as the filter process result to the screen reorderingbuffer 217 and the frame memory 219.

The screen reordering buffer 217 performs reordering of the frame orderon the decoded image. In other words, the screen reordering buffer 217reorders an image of respective frames reordered in the encoding orderby the screen reordering buffer 112 in an original display order. Inother words, the screen reordering buffer 217 stores the image data ofthe decoded image of the respective frames supplied in the encodingorder in that order, reads the image data of the decoded image of therespective frames stored in the encoding order in the display order, andsupplies it to the D/A conversion section 218. The D/A conversionsection 218 performs the D/A conversion on the decoded image (digitaldata) of the respective frames supplied from the screen reorderingbuffer 217, and outputs analog data to be displayed on a display (notillustrated).

The frame memory 219 stores the supplied decoded image, and supplies thestored decoded image to the intra prediction section 221 or the interprediction section 222 as the reference image via the selecting section220 under certain timing or based on an external request from the intraprediction section 221, the inter prediction section 222, or the like.

The intra prediction mode information and the like are appropriatelysupplied from the lossless decoding section 212 to the intra predictionsection 221. The intra prediction section 221 performs the intraprediction in the intra prediction mode (the optimum intra predictionmode) used in the intra prediction section 124, and generates theprediction image. In this event, the intra prediction section 221performs the intra prediction using the image data of the reconstructedimage supplied from the frame memory 219 via the selecting section 220.In other words, the intra prediction section 221 uses the reconstructedimage as the reference image (a neighboring pixel). The intra predictionsection 221 supplies the generated prediction image to the predictionimage selecting section 223.

The optimum prediction mode information, the motion information, and thelike are appropriately supplied from the lossless decoding section 212to the inter prediction section 222. The inter prediction section 222performs the inter prediction using the decoded image (the referenceimage) acquired from the frame memory 219 in the inter prediction mode(the optimum inter prediction mode) indicated by the optimum predictionmode information acquired from the lossless decoding section 212, andgenerates the prediction image.

The prediction image selecting section 223 supplies the prediction imagesupplied from the intra prediction section 221 or the prediction imagesupplied from the inter prediction section 222 to the operation section215. Then, the operation section 215 obtains the reconstructed image inwhich the prediction image is added to the decoded residual data (thedifferential image information) from the inverse orthogonal transformsection 214.

Further, the base layer image decoding section 202 performs the decodingwithout referring to another layer. In other words, the intra predictionsection 221 and the inter prediction section 222 do not refer to theencoding-related information of another layer.

Further, the frame memory 219 supplies the stored image data of thedecoded image of the base layer to the enhancement layer image decodingsection 203 as the encoding-related information of the base layer.

Similarly, the intra prediction section 221 supplies the intraprediction mode information and the like to the enhancement layer imagedecoding section 203 as the encoding-related information of the baselayer.

Similarly, the inter prediction section 222 supplies the motioninformation and the like to the enhancement layer image decoding section203 as the encoding-related information of the base layer.

Further, the intra prediction section 221 or the inter predictionsection 222 (an arbitrary processing section of the base layer imagedecoding section 202 such as the lossless decoding section 212) suppliesthe base layer area division information to the enhancement layer imagedecoding section 203.

<Enhancement Layer Image Decoding Section>

FIG. 28 is a block diagram illustrating an example of a mainconfiguration of the enhancement layer image decoding section 203 ofFIG. 26 . As illustrated in FIG. 28 , the enhancement layer imagedecoding section 203 has basically a configuration similar to that ofthe base layer image decoding section 202 of FIG. 27 .

In other words, the enhancement layer image decoding section 203includes an accumulation buffer 231, a lossless decoding section 232, aninverse quantization section 233, an inverse orthogonal transformsection 234, an operation section 235, a loop filter 236, a screenreordering buffer 237, and a D/A conversion section 238 as illustratedin FIG. 28 . The enhancement layer image decoding section 203 furtherincludes a frame memory 239, a selecting section 240, an intraprediction section 241, an inter prediction section 242, and aprediction image selecting section 243.

The accumulation buffer 231 to the prediction image selecting section243 correspond to the accumulation buffer 211 to the prediction imageselecting section 223 of FIG. 27 , and perform processes similar tothose performed by the corresponding processing sections. However, therespective sections of the enhancement layer image decoding section 203perform processing of encoding the enhancement layer image informationrather than that of the base layer. Therefore, the description of theaccumulation buffer 211 to the prediction image selecting section 223 ofFIG. 27 can be applied as a description of processes of the accumulationbuffer 231 to the prediction image selecting section 243, but, in thiscase, data to be processed needs to be data of the enhancement layerrather than data of the base layer. Further, it is necessary tointerpret a processing section of data input source and data outputdestination as a corresponding processing section of the enhancementlayer image decoding section 203 appropriately.

The enhancement layer image decoding section 203 further includes anarea synchronization section 244 and an up-sampling section 245.

The area synchronization section 244 acquires the enhancement layer areadivision information and the synchronization area information suppliedfrom the lossless decoding section 232. The information is generated atthe decoding side and transmitted from the decoding side. Further, thearea synchronization section 244 acquires the base layer area divisioninformation supplied from the base layer image decoding section 202.

The area synchronization section 244 controls an area in which theencoding-related information of the base layer is referred to in thedecoding of the enhancement layer using the information. For example,the area synchronization section 244 controls an area of the base layerin which the encoding-related information is referred to when the intraprediction section 241 or the inter prediction section 242 performs theinter-layer prediction using the information. As a result, similarly tothe time of encoding, the area synchronization section 244 can controlan area in which the encoding-related information of the base layer isreferred to in the decoding of the enhancement layer. Therefore, thearea synchronization section 244 can reduce the number of memoryaccesses and suppress an increase in the decoding workload.

The enhancement layer image decoding section 203 performs encoding withreference to the encoding-related information of another layer (forexample, the base layer).

The up-sampling section 245 acquires the encoding-related information ofthe base layer supplied from the base layer image decoding section 202.For example, the up-sampling section 245 acquires the textureinformation such as the decoded image (also referred to as a “decodedbase layer image”) of the base layer as the encoding-relatedinformation. Further, for example, when the inter layer syntaxprediction process (the inter layer prediction) is performed, theup-sampling section 245 acquires the syntax information such as themotion information and the intra prediction mode information of the baselayer as the encoding-related information as well.

The up-sampling section 245 performs the up-sampling process on theacquired encoding-related information of the base layer. In the scalablecoding, different layers differ in a value of a certain parameter (forexample, a resolution or the like) having a scalability function. Thus,the up-sampling section 245 performs the up-sampling process (performsthe scalable parameter conversion process) on the encoding-relatedinformation of the base layer so that the value of the parameter isconverted on the basis of the enhancement layer. As the up-samplingprocess is performed as described above, the encoding-relatedinformation of the base layer can be used in the decoding of theenhancement layer.

The up-sampling section 149 supplies the encoding-related information ofthe base layer that has undergone the up-sampling process to be storedin the frame memory 239. For example, the encoding-related informationof the base layer is supplied to the intra prediction section 241 or theinter prediction section 242 as the reference image. Similarly, thesyntax information is supplied to the intra prediction section 241 orthe inter prediction section 242 as well.

<Area Synchronization Section>

FIG. 29 is a block diagram illustrating an example of a mainconfiguration of the area synchronization section 244 of FIG. 28 .

The area synchronization section 244 includes a base layer area divisioninformation buffer 271, an enhancement layer area division informationbuffer 272, and a synchronization area information decoding section 273as illustrated in FIG. 29 .

The base layer area division information buffer 271 acquires the baselayer area division information supplied from the base layer imagedecoding section 202, that is, the base layer area division informationsupplied from the encoding side, and holds the acquired base layer areadivision information. The base layer area division information buffer271 supplies the held base layer area division information to thesynchronization area information decoding section 273 under certaintiming or according to an external request from the synchronization areainformation decoding section 273 or the like.

The enhancement layer area division information buffer 272 acquires theenhancement layer area division information supplied from the losslessdecoding section 232, that is, the enhancement layer area divisioninformation supplied from the encoding side, and holds the acquiredenhancement layer area division information. The enhancement layer areadivision information buffer 272 supplies the held enhancement layer areadivision information to the synchronization area information decodingsection 273 under certain timing or according to an external requestfrom the synchronization area information decoding section 273 or thelike.

The synchronization area information decoding section 273 acquires thebase layer area division information from the base layer area divisioninformation buffer 271, and acquires the enhancement layer area divisioninformation from the enhancement layer area division information buffer272. Further, the synchronization area information decoding section 273acquires the synchronization area information supplied from the losslessdecoding section 232, that is, acquires the synchronization areainformation supplied from the encoding side, and holds the acquiredsynchronization area information.

The synchronization area information is information used to control thearea of the base layer serving as the reference destination of theencoding-related information of each area of the enhancement layer. Thesynchronization area information decoding section 273 decodes thesynchronization area information using the base layer area divisioninformation and the enhancement layer area division information. Inother words, the synchronization area information decoding section 273detects a positional relation between the areas of the layers using thebase layer area division information and the enhancement layer areadivision information, and analyzes the correspondence relation betweenthe areas of the layers indicated by the synchronization areainformation according to the positional relation.

More specifically, the synchronization area information decoding section273 specifies a position of data of the area of the base layer servingas the reference destination of the encoding-related information for thecurrent area serving as the processing target of the enhancement layerin data of the encoding-related information such as the reference imagesupplied from the frame memory 239. The synchronization area informationdecoding section 273 generates the synchronization address informationserving as information indicated by the position of the data, andsupplies the synchronization address information to the intra predictionsection 241 or the inter prediction section 242.

As a result, since all information used by the synchronization areainformation decoding section 273 is information supplied from theencoding side, the synchronization area information decoding section 273can generate synchronization address information similar to thatgenerated by the area synchronization setting section 173. In otherwords, the synchronization area information decoding section 273 canperform control similar to that performed by the area synchronizationsetting section 173.

Since the intra prediction section 241 or the inter prediction section242 performs the inter-layer prediction according to the synchronizationaddress information, only some areas of the picture of the base layercan be set as the reference destination, and an increase in the numberof accesses to the frame memory 239 can be suppressed. In other words,the synchronization area information decoding section 273 can reduce thenumber of memory accesses and suppress an increase in the decodingworkload by performing the above-described process.

<Flow of Image Decoding Process>

Next, the flow of each process performed by the image decoding device200 will be described. First, an example of the flow of the imagedecoding process will be described with reference to a flowchart of FIG.30 .

When the image decoding process starts, in step S201, the demultiplexingunit 201 of the image decoding device 200 performs demultiplexing on thelayered image encoded stream transmitted from the encoding side for eachlayer.

In step S202, the base layer image decoding section 202 decodes the baselayer image encoded stream extracted in the process of step S201. Thebase layer image decoding section 202 outputs data of the base layerimage generated by the decoding.

In step S203, the enhancement layer image decoding section 203 decodesthe enhancement layer image encoded stream extracted in the process ofstep S201. The enhancement layer image decoding section 203 outputs dataof the enhancement layer image generated by the decoding.

When the process of step S203 ends, the image decoding device 200 endsthe image decoding process. One picture is processed in this imagedecoding process. Therefore, the image decoding device 200 repeatedlyperforms the image decoding process on each picture of the hierarchizedmoving image data.

<Flow of Base Layer Decoding Process>

Next, an example of the flow of the base layer decoding processperformed by the base layer image decoding section 202 in step S202 ofFIG. 30 will be described with reference to a flowchart of FIG. 31 .

When the base layer decoding process starts, in step S221, the losslessdecoding section 212 of the base layer image decoding section 202decodes the encoded data acquired through the accumulation buffer 211,and acquires the base layer area division information supplied from theencoding side. Further, the lossless decoding section 212 supplies thebase layer area division information to the respective sections of thebase layer image decoding section 202.

The subsequent processes are executed for each of the areas set in stepS221. In other words, each process is executed using the area or acertain unit smaller than the area as a processing unit.

In step S222, the accumulation buffer 211 accumulates the transmittedbitstream (encoded data). In step S223, the lossless decoding section212 decodes the bitstream (encoded data) supplied from the accumulationbuffer 211. In other words, image data such as an I picture, a Ppicture, and a B picture encoded by the lossless encoding section 116 isdecoded. At this time, various kinds of information are decoded inaddition to the image data included in the bitstream such as the headerinformation.

In step S224, the inverse quantization section 213 inversely quantizesthe quantized coefficients obtained in the process of step S223.

In step S225, the inverse orthogonal transform section 214 performs theinverse orthogonal transform on the coefficients inversely quantized instep S224.

In step S226, the intra prediction section 221 or the inter predictionsection 222 performs the prediction process, and generates thepredictive image. In other words, the prediction process is performed inthe prediction mode that is determined to have been applied in the eventof encoding in the lossless decoding section 212. More specifically, forexample, when the intra prediction is applied in the event of encoding,the intra prediction section 221 generates the predictive image in theintra prediction mode recognized to be optimal in the event of encoding.Further, for example, when the inter prediction is applied in the eventof encoding, the inter prediction section 222 generates the predictiveimage in the inter prediction mode recognized to be optimal in the eventof encoding.

In step S227, the operation section 215 adds the differential imageobtained by performing the inverse orthogonal transform in step S225 tothe prediction image generated in step S226. As a result, the image dataof the reconstructed image is obtained.

In step S228, the loop filter 216 appropriately performs the loop filterprocess including the deblock filter process, the adaptive loop filterprocess, or the like on the image data of the reconstructed imageobtained in the process of step S227.

In step S229, the screen reordering buffer 217 reorders the respectiveframes of the reconstructed image that has undergone the filter processin step S228. In other words, the order of the frames reordered in theevent of encoding is changed to the original display order.

In step S230, the D/A conversion section 218 performs the D/A conversionon the image in which the order of the frames is reordered in step S229.The image is output to a display (not illustrated), and the image isdisplayed.

In step S231, the frame memory 219 stores data such as the decoded imageobtained in the process of step S228, the reconstructed image obtainedin the process of step S227, and the like.

In step S232, the frame memory 219, the intra prediction section 221,and the inter prediction section 222 supplies the encoding-relatedinformation of the base layer supplied from the encoding side for thedecoding process of the enhancement layer.

When the process of step S232 ends, the base layer decoding processends, and the process returns to FIG. 30 .

<Flow of Enhancement Layer Decoding Process>

Next, an example of the flow of the enhancement layer decoding processperformed by the enhancement layer image decoding section 203 in stepS203 of FIG. 30 will be described with reference to flowcharts of FIGS.32 and 33 .

When the enhancement layer decoding process starts, in step S251, thebase layer area division information buffer 271 of the enhancement layerimage decoding section 203 acquires the base layer area divisioninformation supplied from the base layer image decoding section 202 inthe base layer decoding process. The base layer area divisioninformation is information supplied from the encoding side.

In step S252, the up-sampling section 245 acquires the decoded baselayer image (that is, texture information) supplied from the base layerimage decoding section 202 in the base layer decoding process as theencoding-related information. Further, when the inter-layer syntaxprediction is performed, the up-sampling section 245 acquires the syntaxinformation supplied from the base layer image decoding section 202 inthe base layer decoding process as the encoding-related information aswell. The encoding-related information is information supplied from theencoding side or information restored based on information supplied fromthe encoding side.

In step S253, the up-sampling section 245 performs the up-samplingprocess on the encoding-related information of the base layer (forexample, the decoded base layer image) acquired in step S252. The framememory 239 stores the encoding-related information of the base layer(for example, the decoded base layer image) that has undergone theup-sampling process through the process of step S253.

In step S254, the enhancement layer area division information buffer 272acquires the enhancement layer area division information supplied fromthe lossless decoding section 232. The enhancement layer area divisioninformation is information supplied from the encoding side.

In step S255, the synchronization area information decoding section 273acquires the synchronization area information supplied from the losslessdecoding section 232. The synchronization area information isinformation supplied from the encoding side.

In step S256, the synchronization area information decoding section 273analyzes the synchronization area information acquired in step S255using the base layer area division information acquired in step S251 andthe enhancement layer area division information acquired in step S254,sets a position (a synchronization address) of data of the area of thebase layer serving as the reference destination, and generates thesynchronization address information indicating the synchronizationaddress. The synchronization area information decoding section 273supplies the generated synchronization address information to the intraprediction section 241 or the inter prediction section 242. The intraprediction section 241 or the inter prediction section 242 to which thesynchronization address information has been supplied performs theinter-layer prediction using the synchronization address information.

When the process of step S256 ends, the process proceeds to step S21 ofFIG. 33 .

The subsequent processes are executed for each of the areas indicated bythe enhancement layer area division information. In other words, eachprocess is executed using the area or a certain unit smaller than thearea as a processing unit.

The process of step S261 to step S270 of FIG. 33 corresponds and isperformed similarly to the process of step S222 to step S231 of FIG. 31.

However, when the inter-layer prediction is performed, in step S265, theintra prediction section 241 or the inter prediction section 242performs the process according to the synchronization addressinformation generated in step S256 of FIG. 32 . In other words, theintra prediction section 241 or the inter prediction section 242performs the inter-layer prediction with reference to only theencoding-related information of the areas of the base layer designatedby the synchronization address information.

When the process of step S270 ends, the enhancement layer decodingprocess ends, and the process returns to FIG. 30 .

As the process is performed as described above, the image decodingdevice 200 can decrease the number of memory accesses for referring tothe encoding-related information of another layer in the inter-layerprediction and suppress an increase in the decoding workload.

In the above example, the image data is hierarchized and divided into aplurality of layers through the scalable coding, but the number oflayers is arbitrary. Further, in the above example, regarding encodingand decoding, the enhancement layer is processed with reference to thebase layer, but the present disclosure is not limited to this example,and the enhancement layer may be processed with reference to anotherenhancement layer that has been processed.

For example, in the case of the image encoding device 100 of FIG. 18 ,the frame memory 142, the intra prediction section 144, and the interprediction section 145 (FIG. 20 ) of the enhancement layer imageencoding section 102 of the enhancement layer in which theencoding-related information is referred to may supply theencoding-related information of the enhancement layer to the enhancementlayer image encoding section 102 of another enhancement layer in whichthe encoding-related information is referred to, similarly to the framememory 122, the intra prediction section 124, and the inter predictionsection 125 (FIG. 19 ).

Further, for example, in the case of the image decoding device 200 ofFIG. 26, the frame memory 239, the intra prediction section 241, and theinter prediction section 242 (FIG. 28 ) of the enhancement layer imagedecoding section 203 of the enhancement layer in which theencoding-related information is referred to may supply theencoding-related information of the enhancement layer to the enhancementlayer image decoding section 203 of another enhancement layer in whichthe encoding-related information of the enhancement layer is referredto, similarly to the frame memory 219, the intra prediction section 221,and the inter prediction section 222 (FIG. 27 ).

The present technology can be applied to a so-called image encodingdevice and an image decoding device based on a scalable coding/decodingscheme.

For example, the present technology can be applied to an image encodingdevice and an image decoding device used when image information(bitstream) compressed by an orthogonal transform such as a discretecosine transform and motion compensation as in MPEG and H.26x isreceived via a network medium such as satellite broadcasting, cabletelevision, the Internet, or a mobile telephone. Further, the presenttechnology can be applied to an image encoding device and an imagedecoding device used when processing is performed on a storage mediumsuch as an optical disc, a magnetic disk, or a flash memory.

4. Third Embodiment

<Application to Multi-View Image Coding/Multi-View Image Decoding>

The series of processes described above can be applied to multi-viewimage coding and multi-view image decoding. FIG. 34 illustrates anexemplary multi-view image coding scheme.

As illustrated in FIG. 34 , a multi-view image includes images of aplurality of views. A plurality of views of the multi-view imageincludes a base view in which encoding and decoding are performed usingonly an image of its own view without using information of another viewand a non-base view in which encoding and decoding are performed usinginformation of another view. Encoding and decoding of the non-base viewmay be performed using information of the base view or using informationof another non-base view.

In other words, a reference relation between views in the multi-viewimage coding and decoding is similar to the reference relation betweenlayers in the scalable image encoding and decoding. Therefore, theabove-described method may be applied to the encoding and decoding of amulti-view image illustrated in FIG. 34 . In other words, in theencoding and decoding of the non-base view, an area of the base view (oranother non-base view) in which the encoding-related information isreferred to may be controlled. As a result, even in the case of themulti-view image, similarly, it is possible to suppress an increase inthe encoding or decoding workload.

<Multi-View Image Encoding Device>

FIG. 35 is a diagram illustrating a multi-view image coding device thatperforms the multi-view image encoding. As illustrated in FIG. 35 , amulti-view image encoding device 60) includes an encoding section 601,an encoding section 602, and a multiplexing unit 603.

The encoding section 601 encodes a base view image and generates a baseview image encoded stream. The encoding section 602 encodes a non-baseview image and generates a non-base view image encoded stream. Themultiplexing section 603 multiplexes the base view image encoded streamgenerated in the encoding section 601 and the non-base view imageencoded stream generated in the encoding section 602, and generates amulti-view image encoded stream.

The base layer image encoding section 101 (FIG. 19 ) may be applied asthe encoding section 601 of the multi-view image encoding device 600,and the enhancement layer image encoding section 102 (FIG. 20 ) may beapplied as the encoding section 602. In other words, in the encoding ofthe non-base view, an area of the base view (or another non-base view)in which the encoding-related information is referred to may becontrolled. As a result, even in the case of the multi-view image,similarly, it is possible to suppress an increase in the encodingworkload. Further, even in the case of the multi-view image encoding, itis possible to suppress an increase in the decoding workload bytransmitting the control information used to control the area in whichencoding-related information is referred to to the decoding side.

<Multi-View Image Decoding Device>

FIG. 36 is a diagram illustrating a multi-view image decoding devicethat performs the multi-view image decoding. As illustrated in FIG. 36 ,a multi-view image decoding device 610 includes a demultiplexing unit611, a decoding section 612, and a decoding section 613.

The inverse multiplexing section 611 inversely multiplexes a multi-viewimage encoded stream in which a base view image encoded stream and anon-base view image encoded stream are multiplexed, and extracts thebase view image encoded stream and the non-base view image encodedstream. The decoding section 612 decodes the base view image encodedstream extracted by the inverse multiplexing section 611 and obtains abase view image. The decoding section 613 decodes the non-base viewimage encoded stream extracted by the inverse multiplexing section 611and obtains a non-base view image.

The base layer image decoding section (FIG. 27 ) may be applied as thedecoding section 612 of the multi-view image decoding device 610, andthe enhancement layer image decoding section 203 (FIG. 28 ) may beapplied as the decoding section 613. In other words, in the decoding ofthe non-base view, an area of the base view (or another non-base view)in which the encoding-related information is referred to may becontrolled. As a result, even in the case of the multi-view image,similarly, it is possible to suppress an increase in the decodingworkload.

4. Fourth Embodiment

<Computer>

The above described series of processes can be executed by hardware orcan be executed by software. When the series of processes are to beperformed by software, the programs forming the software are installedinto a computer. Here, a computer includes a computer which isincorporated in dedicated hardware or a general-purpose personalcomputer (PC) which can execute various functions by installing variousprograms into the computer, for example.

FIG. 37 is a block diagram illustrating a configuration example ofhardware of a computer for executing the above-described series ofprocesses through a program.

In a computer 800 shown in FIG. 37 , a central processing unit (CPU)801, a read only memory (ROM) 802, and a random access memory (RAM) 803are connected to one another by a bus 804.

An input and output interface 810 is further connected to the bus 804.An input section 811, an output section 812, a storage section 813, acommunication section 814, and a drive 815 are connected to the inputand output interface 810.

The input section 811 is formed with a keyboard, a mouse, a microphone,a touch panel, an input terminal, and the like. The output section 812is formed with a display, a speaker, an output terminal, and the like.The storage section 813 is formed with a hard disk, a RAM disk, anonvolatile memory, or the like. The communication section 814 is formedwith a network interface or the like. The drive 815 drives a removablemedium 821 such as a magnetic disk, an optical disk, a magneto-opticaldisk, or a semiconductor memory.

In the computer configured as described above, the CPU 801 loads theprograms stored in the storage section 813 into the RAM 803 via theinput and output interface 810 and the bus 804, and executes theprograms, so that the above described series of processes are performed.The RAM 803 also stores data necessary for the CPU 801 to execute thevarious processes.

The program executed by the computer (the CPU 801) may be provided bybeing recorded on the removable medium 821 as a packaged medium or thelike. In this case, by loading the removable medium 821 into the drive815, the program can be installed into the storage section 813 via theinput and output interface 810.

Further, the program may be provided through a wired or wirelesstransmission medium such as a local area network, the Internet, ordigital broadcasting. In this case, it is also possible to receive theprogram from a wired or wireless transfer medium using the communicationsection 814 and install the program into the storage section 813.

Furthermore, the program can also be installed in advance into the ROM802 or the storage section 813.

It should be noted that the program executed by a computer may be aprogram that is processed in time sequence according to the describedsequence or a program that is processed in parallel or under necessarytiming such as upon calling.

In the present disclosure, steps of describing the program to berecorded on the recording medium may include processing performed intime sequence according to the description order and processing notprocessed in time sequence but performed in parallel or individually.

In addition, in this disclosure, a system means a set of a plurality ofconstituent elements (devices, modules (parts), or the like) regardlessof whether or not all constituent elements are arranged in the samehousing. Thus, both a plurality of devices that is accommodated inseparate housings and connected via a network and a single device inwhich a plurality of modules is accommodated in a single housing aresystems.

Further, a constituent element described as a single device (orprocessing unit) above may be divided and configured as a plurality ofdevices (or processing units). On the contrary, constituent elementsdescribed as a plurality of devices (or processing units) above may beconfigured collectively as a single device (or processing unit).Further, a constituent element other than those described above may beadded to each device (or processing unit). Furthermore, a part of aconstituent element of a given device (or processing unit) may beincluded in a constituent element of another device (or anotherprocessing unit) as long as the configuration or operation of the systemas a whole is substantially the same.

The preferred embodiments of the present disclosure have been describedabove with reference to the accompanying drawings, whilst the presentinvention is not limited to the above examples, of course. A personskilled in the art may find various alterations and modifications withinthe scope of the appended claims, and it should be understood that theywill naturally come under the technical scope of the present disclosure.

For example, the present disclosure can adopt a configuration of cloudcomputing which processes by allocating and connecting one function by aplurality of apparatuses through a network.

Further, each step described by the above mentioned flow charts can beexecuted by one apparatus or by allocating a plurality of apparatuses.

In addition, in the case where a plurality of processes is included inone step, the plurality of processes included in this one step can beexecuted by one apparatus or by allocating a plurality of apparatuses.

The image encoding device and the image decoding device according to theembodiment may be applied to various electronic devices such astransmitters and receivers for satellite broadcasting, cablebroadcasting such as cable TV, distribution on the Internet,distribution to terminals via cellular communication and the like,recording devices that record images in a medium such as optical discs,magnetic disks and flash memory, and reproduction devices that reproduceimages from such storage medium. Four applications will be describedbelow.

6. Applications

<First Application: Television Receiver>

FIG. 38 illustrates an example of a schematic configuration of atelevision device to which the embodiment is applied. A televisiondevice 900 includes an antenna 901, a tuner 902, a demultiplexer 903, adecoder 904, an video signal processing section 905, a display section906, an audio signal processing section 907, a speaker 908, an externalinterface (I/F) section 909, a control section 910, a user interface(I/F) 911, and a bus 912.

The tuner 902 extracts a signal of a desired channel from broadcastsignals received via the antenna 901, and demodulates the extractedsignal. The tuner 902 then outputs an encoded bitstream obtained throughthe demodulation to the demultiplexer 903. That is, the tuner 902 servesas a transmission unit of the television device 900 for receiving anencoded stream in which an image is encoded.

The demultiplexer 903 demultiplexes the encoded bitstream to obtain avideo stream and an audio stream of a program to be viewed, and outputseach stream obtained through the demultiplexing to the decoder 904. Thedemultiplexer 903 also extracts auxiliary data such as electronicprogram guides (EPGs) from the encoded bitstream, and supplies theextracted data to the control section 910. Additionally, thedemultiplexer 903 may perform descrambling when the encoded bitstream isscrambled.

The decoder 904 decodes the video stream and the audio stream input fromthe demultiplexer 903. The decoder 904 then outputs video data generatedin the decoding process to the video signal processing section 905. Thedecoder 904 also outputs the audio data generated in the decodingprocess to the audio signal processing section 907.

The video signal processing section 905 reproduces the video data inputfrom the decoder 904, and causes the display section 906 to display thevideo. The video signal processing section 905 may also cause thedisplay section 906 to display an application screen supplied via anetwork. Further, the video signal processing section 905 may perform anadditional process such as noise removal, for example, on the video datain accordance with the setting. Furthermore, the video signal processingsection 905 may generate an image of a graphical user interface (GUI)such as a menu, a button and a cursor, and superimpose the generatedimage on an output image.

The display section 906 is driven by a drive signal supplied from thevideo signal processing section 905, and displays video or an image on avideo screen of a display device (e.g. liquid crystal display, plasmadisplay, organic electroluminescence display (OLED), etc.).

The audio signal processing section 907 performs a reproduction processsuch as D/A conversion and amplification on the audio data input fromthe decoder 904, and outputs sound from the speaker 908. The audiosignal processing section 907 may also perform an additional processsuch as noise removal on the audio data.

The external interface section 909 is an interface for connecting thetelevision device 900 to an external device or a network. For example, avideo stream or an audio stream received via the external interfacesection 909 may be decoded by the decoder 904. That is, the externalinterface section 909 also serves as a transmission unit of thetelevision device 900 for receiving an encoded stream in which an imageis encoded.

The control section 910 includes a processor such as a centralprocessing unit (CPU), and a memory such as random access memory (RAM)and read only memory (ROM). The memory stores a program to be executedby the CPU, program data, EPG data, data acquired via a network, and thelike. The program stored in the memory is read out and executed by theCPU at the time of activation of the television device 900, for example.The CPU controls the operation of the television device 900, forexample, in accordance with an operation signal input from the userinterface section 911 by executing the program.

The user interface section 911 is connected to the control section 910.The user interface section 911 includes, for example, a button and aswitch used for a user to operate the television device 900, and areceiving section for a remote control signal. The user interfacesection 911 detects an operation of a user via these constituentelements, generates an operation signal, and outputs the generatedoperation signal to the control section 910.

The bus 912 connects the tuner 902, the demultiplexer 903, the decoder904, the video signal processing section 905, the audio signalprocessing section 907, the external interface section 909, and thecontrol section 910 to each other.

The decoder 904 has a function of the image decoding device 200according to the embodiment in the television device 900 configured inthis manner. Accordingly, it is possible to suppress an increase in thedecoding workload when an image is decoded in the television device 900.

<Second Application: Mobile Phone>

FIG. 39 illustrates an example of a schematic configuration of a mobilephone to which the embodiment is applied. A mobile phone 920 includes anantenna 921, a communication section 922, an audio codec 923, a speaker924, a microphone 925, a camera section 926, an image processing section927, a demultiplexing section 928, a recording/reproduction section 929,a display section 930, a control section 931, an operation section 932,and a bus 933.

The antenna 921 is connected to the communication section 922. Thespeaker 924 and the microphone 925 are connected to the audio codec 923.The operation section 932 is connected to the control section 931. Thebus 933 connects the communication section 922, the audio codec 923, thecamera section 926, the image processing section 927, the demultiplexingsection 928, the recording/reproduction section 929, the display section930, and the control section 931 to each other.

The mobile phone 920 performs an operation such as transmission andreception of an audio signal, transmission and reception of email orimage data, image capturing, and recording of data in various operationmodes including an audio call mode, a data communication mode, an imagecapturing mode, and a videophone mode.

An analogue audio signal generated by the microphone 925 is supplied tothe audio codec 923 in the audio call mode. The audio codec 923 convertsthe analogue audio signal into audio data, has the converted audio datasubjected to the A/D conversion, and compresses the converted data. Theaudio codec 923 then outputs the compressed audio data to thecommunication section 922. The communication section 922 encodes andmodulates the audio data, and generates a transmission signal. Thecommunication section 922 then transmits the generated transmissionsignal to a base station (not illustrated) via the antenna 921. Thecommunication section 922 also amplifies a wireless signal received viathe antenna 921 and converts the frequency of the wireless signal toacquire a received signal. The communication section 922 thendemodulates and decodes the received signal, generates audio data, andoutputs the generated audio data to the audio codec 923. The audio codec923 extends the audio data, has the audio data subjected to the D/Aconversion, and generates an analogue audio signal. The audio codec 923then supplies the generated audio signal to the speaker 924 to outputsound.

The control section 931 also generates text data constituting email inaccordance with an operation made by a user via the operation section932, for example. Moreover, the control section 931 causes the displaysection 930 to display the text. Furthermore, the control section 931generates email data in accordance with a transmission instruction froma user via the operation section 932, and outputs the generated emaildata to the communication section 922. The communication section 922encodes and modulates the email data, and generates a transmissionsignal. The communication section 922 then transmits the generatedtransmission signal to a base station (not illustrated) via the antenna921. The communication section 922 also amplifies a wireless signalreceived via the antenna 921 and converts the frequency of the wirelesssignal to acquire a received signal. The communication section 922 thendemodulates and decodes the received signal to restore the email data,and outputs the restored email data to the control section 931. Thecontrol section 931 causes the display section 930 to display thecontent of the email, and also causes the storage medium of therecording/reproduction section 929 to store the email data.

The recording/reproduction section 929 includes a readable and writablestorage medium. For example, the storage medium may be a built-instorage medium such as RAM and flash memory, or an externally mountedstorage medium such as hard disks, magnetic disks, magneto-opticaldisks, optical discs, universal serial bus (USB) memory, and memorycards.

Furthermore, the camera section 926, for example, captures an image of asubject to generate image data, and outputs the generated image data tothe image processing section 927 in the image capturing mode. The imageprocessing section 927 encodes the image data input from the camerasection 926, and causes the storage medium of the recording/reproductionsection 929 to store the encoded stream.

Furthermore, the demultiplexing section 928, for example, multiplexes avideo stream encoded by the image processing section 927 and an audiostream input from the audio codec 923, and outputs the multiplexedstream to the communication section 922 in the videophone mode. Thecommunication section 922 encodes and modulates the stream, andgenerates a transmission signal. The communication section 922 thentransmits the generated transmission signal to a base station (notillustrated) via the antenna 921. The communication section 922 alsoamplifies a wireless signal received via the antenna 921 and convertsthe frequency of the wireless signal to acquire a received signal. Thesetransmission signal and received signal may include an encodedbitstream. The communication section 922 then demodulates and decodesthe received signal to restore the stream, and outputs the restoredstream to the demultiplexing section 928. The demultiplexing section 928demultiplexes the input stream to obtain a video stream and an audiostream, and outputs the video stream to the image processing section 927and the audio stream to the audio codec 923. The image processingsection 927 decodes the video stream, and generates video data. Thevideo data is supplied to the display section 930, and a series ofimages is displayed by the display section 930. The audio codec 923extends the audio stream, has the audio stream subjected to the D/Aconversion, and generates an analogue audio signal. The audio codec 923then supplies the generated audio signal to the speaker 924, and causessound to be output.

In the mobile telephone 920 having the above configuration, the imageprocessing section 927 has the functions of the image encoding device100 (FIG. 18 ) and the image decoding device 200 (FIG. 26 ) according tothe above embodiment. Thus, when the mobile telephone 920 encodes anddecodes an image, it is possible to suppress an increase in workload.

<Third Application: Recording/Reproduction Device>

FIG. 40 illustrates an example of a schematic configuration of arecording/reproduction device to which the embodiment is applied. Arecording/reproduction device 940, for example, encodes audio data andvideo data of a received broadcast program and records the encoded audiodata and the encoded video data in a recording medium. For example, therecording/reproduction device 940 may also encode audio data and videodata acquired from another device and record the encoded audio data andthe encoded video data in a recording medium. Furthermore, therecording/reproduction device 940, for example, uses a monitor or aspeaker to reproduce the data recorded in the recording medium inaccordance with an instruction of a user. At this time, therecording/reproduction device 940 decodes the audio data and the videodata.

The recording/reproduction device 940 includes a tuner 941, an externalinterface (I/F) section 942, an encoder 943, a hard disk drive (HDD)944, a disc drive 945, a selector 946, a decoder 947, an on-screendisplay (OSD) 948, a control section 949, and a user interface (I/F)section 950.

The tuner 941 extracts a signal of a desired channel from broadcastsignals received via an antenna (not shown), and demodulates theextracted signal. The tuner 941 then outputs an encoded bitstreamobtained through the demodulation to the selector 946. That is, thetuner 941 serves as a transmission unit of the recording/reproductiondevice 940.

The external interface section 942 is an interface for connecting therecording/reproduction device 940 to an external device or a network.For example, the external interface section 942 may be an IEEE 1394interface, a network interface, an USB interface, a flash memoryinterface, or the like. For example, video data and audio data receivedvia the external interface section 942 are input to the encoder 943.That is, the external interface section 942 serves as a transmissionunit of the recording/reproduction device 940.

When the video data and the audio data input from the external interfacesection 942 have not been encoded, the encoder 943 encodes the videodata and the audio data. The encoder 943 then outputs an encodedbitstream to the selector 946.

The HDD 944 records, in an internal hard disk, the encoded bitstream inwhich content data of video and sound is compressed, various programs,and other data. The HDD 944 also reads out the data from the hard diskat the time of reproducing video or sound.

The disc drive 945 records and reads out data in a recording medium thatis mounted. The recording medium that is mounted on the disc drive 945may be, for example, a DVD disc (DVD-Video, DVD-RAM, DVD-R, DVD-RW, aDVD+R, DVD+RW, etc.), a Blu-ray (registered trademark) disc, or thelike.

The selector 946 selects, at the time of recording video or sound, anencoded bitstream input from the tuner 941 or the encoder 943, andoutputs the selected encoded bitstream to the HDD 944 or the disc drive945. The selector 946 also outputs, at the time of reproducing video orsound, an encoded bitstream input from the HDD 944 or the disc drive 945to the decoder 947.

The decoder 947 decodes the encoded bitstream, and generates video dataand audio data. The decoder 947 then outputs the generated video data tothe OSD 948. The decoder 947 also outputs the generated audio data to anexternal speaker.

The OSD 948 reproduces the video data input from the decoder 947, anddisplays video. The OSD 948 may also superimpose an image of a GUI suchas a menu, a button, and a cursor on a displayed video.

The control section 949 includes a processor such as a CPU, and a memorysuch as RAM and ROM. The memory stores a program to be executed by theCPU, program data, and the like. For example, a program stored in thememory is read out and executed by the CPU at the time of activation ofthe recording/reproduction device 940. The CPU controls the operation ofthe recording/reproduction device 940, for example, in accordance withan operation signal input from the user interface section 950 byexecuting the program.

The user interface section 950 is connected to the control section 949.The user interface section 950 includes, for example, a button and aswitch used for a user to operate the recording/reproduction device 940,and a receiving section for a remote control signal. The user interfacesection 950 detects an operation made by a user via these constituentelements, generates an operation signal, and outputs the generatedoperation signal to the control section 949.

In the recording/reproducing device 940 having the above configuration,the encoder 943 has the function of the image encoding device 100 (FIG.18 ) according to the above embodiment. The decoder 947 has the functionof the image decoding device 200 (FIG. 26 ) according to the aboveembodiment. Thus, when the recording/reproducing device 940 encodes anddecodes an image, it is possible to suppress an increase in workload.

<Fourth Application: Image Capturing Device>

FIG. 41 illustrates an example of a schematic configuration of an imagecapturing device to which the embodiment is applied. An image capturingdevice 960 captures an image of a subject to generate an image, encodesthe image data, and records the image data in a recording medium.

The image capturing device 960 includes an optical block 961, an imagecapturing section 962, a signal processing section 963, an imageprocessing section 964, a display section 965, an external interface(I/F) section 966, a memory 967, a media drive 968, an OSD 969, acontrol section 970, a user interface (I/F) section 971, and a bus 972.

The optical block 961 is connected to the image capturing section 962.The image capturing section 962 is connected to the signal processingsection 963. The display section 965 is connected to the imageprocessing section 964. The user interface section 971 is connected tothe control section 970. The bus 972 connects the image processingsection 964, the external interface section 966, the memory 967, themedia drive 968, the OSD 969, and the control section 970 to each other.

The optical block 961 includes a focus lens, an aperture stop mechanism,and the like. The optical block 961 forms an optical image of a subjecton an image capturing surface of the image capturing section 962. Theimage capturing section 962 includes an image sensor such as a chargecoupled device (CCD) and a complementary metal oxide semiconductor(CMOS), and converts the optical image formed on the image capturingsurface into an image signal which is an electrical signal throughphotoelectric conversion. The image capturing section 962 then outputsthe image signal to the signal processing section 963.

The signal processing section 963 performs various camera signalprocesses such as knee correction, gamma correction, and colorcorrection on the image signal input from the image capturing section962. The signal processing section 963 outputs the image data subjectedto the camera signal process to the image processing section 964.

The image processing section 964 encodes the image data input from thesignal processing section 963, and generates encoded data. The imageprocessing section 964 then outputs the generated encoded data to theexternal interface section 966 or the media drive 968. The imageprocessing section 964 also decodes encoded data input from the externalinterface section 966 or the media drive 968, and generates image data.The image processing section 964 then outputs the generated image datato the display section 965. The image processing section 964 may alsooutput the image data input from the signal processing section 963 tothe display section 965, and cause the image to be displayed.Furthermore, the image processing section 964 may superimpose data fordisplay acquired from the OSD 969 on an image to be output to thedisplay section 965.

The OSD 969 generates an image of a GUI such as a menu, a button, and acursor, and outputs the generated image to the image processing section964.

The external interface section 966 is configured, for example, as an USBinput and output terminal. The external interface section 966 connectsthe image capturing device 960 and a printer, for example, at the timeof printing an image. A drive is further connected to the externalinterface section 966 as needed. A removable medium such as magneticdisks and optical discs is mounted on the drive, and a program read outfrom the removable medium may be installed in the image capturing device960. Furthermore, the external interface section 966 may be configuredas a network interface to be connected to a network such as a LAN andthe Internet. That is, the external interface section 966 serves as atransmission unit of the image capturing device 960.

A recording medium to be mounted on the media dnve 968 may be a readableand writable removable medium such as magnetic disks, magneto-opticaldisks, optical discs, and semiconductor memory. The recording medium mayalso be fixedly mounted on the media drive 968, configuring anon-transportable storage section such as built-in hard disk drives orsolid state drives (SSDs).

The control section 970 includes a processor such as a CPU, and a memorysuch as a RAM and a ROM. The memory stores a program to be executed bythe CPU, program data, and the like. A program stored in the memory isread out and executed by the CPU, for example, at the time of activationof the image capturing device 960. The CPU controls the operation of theimage capturing device 960, for example, in accordance with an operationsignal input from the user interface section 971 by executing theprogram.

The user interface section 971 is connected to the control section 970.The user interface section 971 includes, for example, a button, aswitch, and the like used for a user to operate the image capturingdevice 960. The user interface section 971 detects an operation made bya user via these constituent elements, generates an operation signal,and outputs the generated operation signal to the control section 970.

In the imaging device 960 having the above configuration, the imageprocessing section 964 has the functions of the image encoding device100 (FIG. 18 ) and the image decoding device 200 (FIG. 26 ) according tothe above embodiment. Thus, when the imaging device 960 encodes anddecodes an image, it is possible to suppress an increase in workload.

7. Application Example of Scalable Coding

<First System>

Next, a specific example of using scalable encoded data, in which ascalable coding (image encoding) is performed, will be described. Thescalable coding, for example, is used for selection of data to betransmitted as examples illustrated in FIG. 42 .

In a data transmission system 1000 illustrated in FIG. 42 , adistribution server 1002 reads scalable encoded data stored in ascalable encoded data storage section 1001, and distributes the scalableencoded data to a terminal device such as a personal computer 1004, anAV device 1005, a tablet device 1006, or a mobile phone 1007 via anetwork 1003.

In this event, the distribution server 1002 selects and transmitsencoded data having proper quality according to capability of theterminal device, communication environment, or the like. Even when thedistribution server 1002 transmits unnecessarily high-quality data, ahigh-quality image is not necessarily obtainable in the terminal deviceand it may be a cause of occurrence of delay or overflow. In addition, acommunication band may be unnecessarily occupied or workload of theterminal device may unnecessarily increase. In contrast, even when thedistribution server 1002 transmits unnecessarily low quality data, animage with a sufficient quality may not be obtained. Thus, thedistribution server 1002 appropriately reads and transmits the scalableencoded data stored in the scalable encoded data storage section 1001 asthe encoded data having a proper quality according to the capability ofthe terminal device, the communication environment, or the like.

For example, the scalable encoded data storage section 1001 isconfigured to store scalable encoded data (BL+EL) 1011 in which thescalable coding is performed. The scalable encoded data (BL+EL) 1011 isencoded data including both a base layer and an enhancement layer, andis data from which a base layer image and an enhancement layer image canbe obtained by performing decoding.

The distribution server 1002 selects an appropriate layer according tothe capability of the terminal device for transmitting data, thecommunication environment, or the like, and reads the data of theselected layer. For example, with respect to the personal computer 1004or the tablet device 1006 having high processing capability, thedistribution server 1002 reads the scalable encoded data (BL+EL) 1011from the scalable encoded data storage section 1001, and transmits thescalable encoded data (BL+EL) 1011 without change. On the other hand,for example, with respect to the AV device 1005 or the mobile phone 1007having low processing capability, the distribution server 1002 extractsthe data of the base layer from the scalable encoded data (BL+EL) 1011,and transmits the extracted data of the base layer as low qualityscalable encoded data (BL) 1012 that is data having the same content asthe scalable encoded data (BL+EL) 1011 but has lower quality than thescalable encoded data (BL+EL) 1011.

Because an amount of data can easily be adjusted by employing thescalable encoded data, the occurrence of delay or overflow can besuppressed or the unnecessary increase in the workload of the terminaldevice or the communication media can be suppressed. In addition,because a redundancy between the layers is reduced in the scalableencoded data (BL+EL) 1011, it is possible to further reduce the amountof data than when the encoded data of each layer is treated as theindividual data. Therefore, it is possible to more efficiently use thestorage region of the scalable encoded data storage section 1001.

Because various devices such as the personal computer 1004 to the mobilephone 1007 are applicable as the terminal device, the hardwareperformance of the terminal devices differs according to the device. Inaddition, because there are various applications which are executed bythe terminal device, the software performance thereof also varies.Further, because all the communication networks including a wired,wireless, or both such as the Internet and the local area network (LAN)are applicable as the network 1003 serving as a communication medium,the data transmission performance thereof varies. Further, the datatransmission performance may vary by other communications, or the like.

Therefore, the distribution server 1002 may perform communication withthe terminal device which is the data transmission destination beforestarting the data transmission, and then obtain information related tothe terminal device performance such as hardware performance of theterminal device, or the application (software) performance which isexecuted by the terminal device, and information related to thecommunication environment such as an available bandwidth of the network1003. Then, distribution server 1002 may select an appropriate layerbased on the obtained information.

Also, the extraction of the layer may be performed in the terminaldevice. For example, the personal computer 1004 may decode thetransmitted scalable encoded data (BL+EL) 1011 and display the image ofthe base layer or display the image of the enhancement layer. Inaddition, for example, the personal computer 1004 may be configured toextract the scalable encoded data (BL) 1012 of the base layer from thetransmitted scalable encoded data (BL+EL) 1011, store the extractedscalable encoded data (BL) 1012 of the base layer, transmit to anotherdevice, or decode and display the image of the base layer.

Of course, the numbers of scalable encoded data storage sections 1001,distribution servers 1002, networks 1003, and terminal devices arearbitrary. In addition, although the example of the distribution server1002 transmitting the data to the terminal device is described above,the example of use is not limited thereto. The data transmission system1000 is applicable to any system which selects and transmits anappropriate layer according to the capability of the terminal device,the communication environment, or the like when the scalable encodeddata is transmitted to the terminal device.

In addition, by applying the present technology to the data transmissionsystem 1000 such as FIG. 42 described above in a way similar to theapplication to the layer encoding and layer decoding as explained withreference to FIGS. 1 to 33 , an advantageous benefit similar to thatdescribed with reference to FIGS. 1 to 33 can be obtained.

<Second System>

In addition, the scalable coding, for example, is used for transmissionvia a plurality of communication media as in an example illustrated inFIG. 43 .

In a data transmission system 1100 illustrated in FIG. 43 , abroadcasting station 1101 transmits scalable encoded data (BL) 1121 ofthe base layer by terrestrial broadcasting 1111. In addition, thebroadcasting station 1101 transmits scalable encoded data (EL) 1122 ofthe enhancement layer via any arbitrary network 1112 made of acommunication network that is wired, wireless, or both (for example, thedata is packetized and transmitted).

A terminal device 1102 has a function of receiving the terrestrialbroadcasting 1111 that is broadcast by the broadcasting station 1101 andreceives the scalable encoded data (BL) 1121 of the base layertransmitted via the terrestrial broadcasting 1111. In addition, theterminal device 1102 further has a communication function by which thecommunication is performed via the network 1112, and receives thescalable encoded data (EL) 1122 of the enhancement layer transmitted viathe network 1112.

For example, according to a user's instruction or the like, the terminaldevice 1102 decodes the scalable encoded data (BL) 1121 of the baselayer acquired via the terrestrial broadcasting 1111, thereby obtainingor storing the image of the base layer or transmitting the image of thebase layer to other devices.

In addition, for example, according to the users instruction, theterminal device 1102 combines the scalable encoded data (BL) 1121 of thebase layer acquired via the terrestrial broadcasting 1111 and thescalable encoded data (EL) 1122 of the enhancement layer acquired viathe network 1112, thereby obtaining the scalable encoded data (BL+EL),obtaining or storing the image of the enhancement layer by decoding thescalable encoded data (BL+EL), or transmitting the image of theenhancement layer to other devices.

As described above, the scalable encoded data, for example, can betransmitted via the different communication medium for each layer.Therefore, it is possible to disperse the workload and suppress theoccurrence of delay or overflow.

In addition, according to the situation, the communication medium usedfor the transmission for each layer may be configured to be selected.For example, the scalable encoded data (BL) 1121 of the base layer inwhich the amount of data is comparatively large may be transmitted viathe communication medium having a wide bandwidth, and the scalableencoded data (EL) 1122 of the enhancement layer in which the amount ofdata is comparatively small may be transmitted via the communicationmedia having a narrow bandwidth. In addition, for example, whether thecommunication medium that transmits the scalable encoded data (EL) 1122of the enhancement layer is the network 1112 or the terrestrialbroadcasting 1111 may be switched according to the available bandwidthof the network 1112. Of course, what have been described above can besimilarly applied to data of an arbitrary layer.

By controlling in this way, it is possible to further suppress theincrease in workload in the data transmission.

Of course, the number of layers is arbitrary, and the number ofcommunication media used in the transmission is also arbitrary. Inaddition, the number of terminal devices 1102 which are the destinationof the data distribution is also arbitrary. Further, although theexample of the broadcasting from the broadcasting station 1101 has beendescribed above, the use example is not limited thereto. The datatransmission system 1100 can be applied to any system which divides thescalable encoded data using a layer as a unit and transmits the scalableencoded data via a plurality of links.

In addition, by applying the present technology to the data transmissionsystem 1100 such as FIG. 43 described above in a way similar to theapplication to the layer encoding and layer decoding as described withreference to FIGS. 1 to 33 , an advantageous benefit similar to thatdescribed with reference to FIGS. 1 to 33 can be obtained.

<Third System>

In addition, the scalable coding is used in the storage of the encodeddata as an example illustrated in FIG. 44 .

In an image capturing system 1200 illustrated in FIG. 44 , an imagecapturing device 1201 performs scalable coding on image data obtained bycapturing an image of a subject 1211, and supplies a scalable codingresult as the scalable encoded data (BL+EL) 1221 to a scalable encodeddata storage device 1202.

The scalable encoded data storage device 1202 stores the scalableencoded data (BL+EL) 1221 supplied from the image capturing device 1201with quality according to the situation. For example, in the case ofnormal circumstances, the scalable encoded data storage device 1202extracts data of the base layer from the scalable encoded data (BL+EL)1221, and stores the extracted data as scalable encoded data (BL) 1222of the base layer having a small amount of data at low quality. On theother hand, for example, in the case of notable circumstances, thescalable encoded data storage device 1202 stores the scalable encodeddata (BL+EL) 1221 having a large amount of data at high quality withoutchange.

In this way, because the scalable encoded data storage device 1202 cansave the image at high quality only in a necessary case, it is possibleto suppress the decrease of the value of the image due to thedeterioration of the image quality and suppress the increase of theamount of data, and it is possible to improve the use efficiency of thestorage region.

For example, the image capturing device 1201 is assumed to be a motoringcamera Because content of the captured image is unlikely to be importantwhen a monitoring subject (for example, an invader) is not shown in thecaptured image (in the case of the normal circumstances), the priorityis on the reduction of the amount of data, and the image data (scalableencoded data) is stored at low quality. On the other hand, because thecontent of the captured image is likely to be important when amonitoring target is shown as the subject 1211 in the captured image (inthe case of the notable circumstances), the priority is on the imagequality, and the image data (scalable encoded data) is stored at highquality.

For example, whether the case is the case of the normal circumstances orthe notable circumstances may be determined by the scalable encoded datastorage device 1202 by analyzing the image. In addition, the imagecapturing device 1201 may be configured to make a determination andtransmit the determination result to the scalable encoded data storagedevice 1202.

A determination criterion of whether the case is the case of the normalcircumstances or the notable circumstances is arbitrary and the contentof the image which is the determination criterion is arbitrary. Ofcourse, a condition other than the content of the image can bedesignated as the determination criterion. For example, switching may beconfigured to be performed according to the magnitude or waveform ofrecorded sound, by a predetermined time interval, or by an externalinstruction such as the user's instruction.

In addition, although the two states of the normal circumstances and thenotable circumstances have been described above, the number of states isarbitrary, and for example, switching may be configured to be performedamong three or more states such as normal circumstances, slightlynotable circumstances, notable circumstances, and highly notablecircumstances. However, the upper limit number of states to be switcheddepends upon the number of layers of the scalable encoded data.

In addition, the image capturing device 1201 may determine the number oflayers of the scalable coding according to the state. For example, inthe case of the normal circumstances, the image capturing device 1201may generate the scalable encoded data (BL) 1222 of the base layerhaving a small amount of data at low quality and supply the data to thescalable encoded data storage device 1202. In addition, for example, inthe case of the notable circumstances, the image capturing device 1201may generate the scalable encoded data (BL+EL) 1221 of the base layerhaving a large amount of data at high quality and supply the data to thescalable encoded data storage device 1202.

Although the monitoring camera has been described above as the example,the usage of the image capturing system 1200 is arbitrary and is notlimited to the monitoring camera.

In addition, by applying the present technology to the image capturingsystem 1200 such as FIG. 44 described above in a way similar to theapplication to the layer encoding and layer decoding as described withreference to FIGS. 1 to 33 , an advantageous benefit similar to thatdescribed with reference to FIGS. 1 to 33 can be obtained.

8. Fifth Embodiment Other Embodiments

The above embodiments have been described in connection with the exampleof the device, the system, and the like to which the present technologyis applied, but the present technology is not limited to the aboveexamples and may be implemented as any constituent element mounted inthe device or the device configuring the system, for example, aprocessor serving as a system (large scale integration) LSI or the like,a module using a plurality of processors or the like, a unit using aplurality of modules or the like, a set (that is, some constituentelements of the device) in which any other function is further added toa unit, or the like.

<Video Set>

An example in which the present technology is implemented as a set willbe described with reference to FIG. 45 . FIG. 45 illustrates an exampleof a schematic configuration of a video set to which the presenttechnology is applied.

In recent years, functions of electronic devices have become diverse,and in development or manufacturing thereof, there are many cases inwhich a plurality of constituent elements having relevant functions iscombined and implemented as a set having a plurality of functions aswell as cases in which some constituent elements are implemented bysale, provision, or the like or provided or cases in which it isimplemented as a constituent element having a single function.

A video set 1300 illustrated in FIG. 45 is a multi-functionalizedconfiguration in which a device having a function related to imageencoding and/or image decoding is combined with a device having anyother function related to the function.

The video set 1300 includes a module group such as a video module 1311,an external memory 1312, a power management module 1313, and a front endmodule 1314 and a device having relevant functions such as connectivity1321, a camera 1322, and a sensor 1323 as illustrated in FIG. 45 .

A module is a part having a set of functions into which several relevantpart functions are mutually integrated. A concrete physicalconfiguration is arbitrary, but, for example, it is configured such thata plurality of processes having respective functions, electronic circuitelements such as a resistor and a capacitor, and other devices arearranged and integrated on a wiring substrate. Further, a new module maybe obtained by combining another module or a processor with a module.

In the case of the example of FIG. 45 , the video module 1311 is acombination of configurations having functions related to imageprocessing, and includes an application processor, a video processor, abroadband modem 1333, and a radio frequency (RF) module 1334.

A processor is one in which a configuration having a certain function isintegrated into a semiconductor chip through System On a Chip (SoC), andalso refers to, for example, a system LSI or the like. The configurationhaving the certain function may be a logic circuit (hardwareconfiguration), may be a CPU, a ROM, a RAM, and a program (softwareconfiguration) executed using the CPU, the ROM, and the RAM, and may bea combination of a hardware configuration and a software configuration.For example, a processor may include a logic circuit, a CPU, a ROM, aRAM, and the like, some functions may be implemented through the logiccircuit (hardware configuration), and other functions may be implementedthrough a program (software configuration) executed by the CPU.

The application processor 1331 of FIG. 45 is a processor that executesan application related to image processing. An application executed bythe application processor 1331 can not only perform a calculationprocess but can also control constituent elements inside and outside thevideo module 1311 such as the video processor 1332 as necessary in orderto implement a certain function.

The video processor 1332 is a processor having a function related toimage encoding and/or image decoding.

The broadband modem 1333 is a processor (or a module) that performsprocessing related to wired and/or wireless broadband communication thatis performed via a broadband line such as the Internet or a publictelephone line network. For example, the broadband modem 1333 performsdigital modulation on data (a digital signal) to be transmitted andconverts the data into an analog signal, or performs demodulation on areceived analog signal and converts the analog signal into data (adigital signal). For example, the broadband modem 1333 can performdigital modulation and demodulation on arbitrary information such asimage data processed by the video processor 1332, a stream includingencoded image data, an application program, or setting data.

The RF module 1334 is a module that performs a frequency transformprocess, a modulation/demodulation process, an amplification process, afiltering process, and the like on an RF signal transmitted and receivedthrough an antenna. For example, the RF module 1334 performs, forexample, a frequency transform on a baseband signal generated by thebroadband modem 1333, and generates an RF signal. Further, for example,the RF module 1334 performs, for example, a frequency transform on an RFsignal received through the front end module 1314, and generates abaseband signal

Further, as shown by a dotted line 1341 in FIG. 45 , the applicationprocessor 1331 and the video processor 1332 may be integrated into asingle processor.

The external memory 1312 is a module that is installed outside the videomodule 1311 and has a storage device used by the video module 1311. Thestorage device of the external memory 1312 can be implemented by anyphysical configuration, but is commonly used to store large capacitydata such as image data of frame units, and thus it is desirable toimplement the storage device of the external memory 1312 using arelative inexpensive large-capacity semiconductor memory such as adynamic random access memory (DRAM).

The power management module 1313 manages and controls power supply tothe video module 1311 (the respective constituent elements in the videomodule 1311).

The front end module 1314 is a module that provides a front end function(a circuit of a transmitting and receiving end at an antenna side) tothe RF module 1334. The front end module 1314 includes, for example, anantenna section 2351, a filter 1352, and an amplification section 1353as illustrated in FIG. 45 .

The antenna section 1351 includes an antenna that transmits and receivesa radio signal and a peripheral configuration. The antenna section 1351transmits a signal provided from the amplification section 1353 as aradio signal, and provides a received radio signal to the filter 1352 asan electrical signal (RF signal). The filter 1352 performs, for example,a filtering process on an RF signal received through the antenna section1351, and provides a processed RF signal to the RF module 1334. Theamplification section 1353 amplifies the RF signal provided from the RFmodule 1334, and provides the amplified RF signal to the antenna section1351.

The connectivity 1321 is a module having a function related toconnection with the outside. A physical configuration of theconnectivity 1321 is arbitrary. For example, the connectivity 1321includes a configuration having a communication function other than thatof a communication standard supported by the broadband modem 1333, anexternal I/O terminal, or the like.

For example, the connectivity 1321 may include a module having acommunication function based on a wireless communication standard suchas Bluetooth (a registered trademark), IEEE 802.11 (for example,Wireless Fidelity (Wi-Fi) (a registered trademark)), Near FieldCommunication (NFC). InfraRed Data Association (IrDA), an antenna thattransmits and receives a signal satisfying the standard, or the like.Further, for example, the connectivity 1321 may include a module havinga communication function based on a wired communication standard such asUniversal Serial Bus (USB), or High-Definition Multimedia Interface(HDMI) (a registered trademark) or a terminal that satisfies thestandard. Furthermore, for example, the connectivity 1321 may includeany other data (signal) transmission function or the like such as ananalog I/O terminal.

Further, the connectivity 1321 may include a device of a transmissiondestination of data (signal). For example, the connectivity 1321 mayinclude a drive (including a hard disk, a solid state drive (SSD), aNetwork Attached Storage (NAS), or the like as well as a drive of aremovable medium) that reads/writes data from/in a recording medium suchas a magnetic disk, an optical disc, a magneto optical disc, or asemiconductor memory. Furthermore, the connectivity 1321 may include anoutput device (a monitor, a speaker, or the like) that outputs images orsound.

The camera 1322 is a module having a function of photographing a subjectand obtaining image data of the subject. For example, image dataobtained by image capture from the camera 1322 is provided to andencoded by the video processor 1332.

The sensor 1323 is a module having an arbitrary sensor function such asa sound sensor, an ultrasonic sensor, an optical sensor, an illuminancesensor, an infrared sensor, an image sensor, a rotation sensor, an anglesensor, an angular velocity sensor, a velocity sensor, an accelerationsensor, an inclination sensor, a magnetic identification sensor, a shocksensor, or a temperature sensor. For example, data detected by thesensor 1323 is provided to the application processor 1331 and used by anapplication or the like.

A configuration described above as a module may be implemented as aprocessor, and a configuration described as a processor may beimplemented as a module.

In the video set 1300 having the above configuration, the presenttechnology can be applied to the video processor 1332 as will bedescribed later. Thus, the video set 1300 can be implemented as a set towhich the present technology is applied.

<Exemplary Configuration of Video Processor>

FIG. 46 illustrates an example of a schematic configuration of the videoprocessor 1332 (FIG. 45 ) to which the present technology is applied.

In the case of the example of FIG. 46 , the video processor 1332 has afunction of receiving an input of a video signal and an audio signal andencoding the video signal and the audio signal according to a certainscheme and a function of decoding encoded video data and audio data, andreproducing and outputting a video signal and an audio signal

The video processor 1332 includes a video input processing section 1401,a first image enlarging/reducing section 1402, a second imageenlarging/reducing section 1403, a video output processing section 1404,a frame memory 1405, and a memory control section 1406 as illustrated inFIG. 46 . The video processor 1332 further includes an encoding/decodingengine 1407, video elementary stream (ES) buffers 1408A and 1408B, andaudio ES buffers 1409A and 1409B. The video processor 1332 furtherincludes an audio encoder 1410, an audio decoder 1411, a multiplexer(multiplexer (MUX)) 1412, a demultiplexer (demultiplexer (DMUX)) 1413,and a stream buffer 1414.

For example, the video input processing section 1401 acquires a videosignal input from the connectivity 1321 (FIG. 45 ) or the like, andconverts the video signal into digital image data. The first imageenlarging/reducing section 1402 performs, for example, a formatconversion process and an image enlargement/reduction process on theimage data. The second image enlarging/reducing section 1403 performs animage enlargement/reduction process on the image data according to aformat of a destination to which the image data is output through thevideo output processing section 1404 or performs the format conversionprocess and the image enlargement/reduction process which are similar tothose of the first image enlarging/reducing section 1402 on the imagedata. The video output processing section 1404 performs formatconversion and conversion into an analog signal on the image data, andoutputs a reproduced video signal, for example, to the connectivity 1321(FIG. 45 ) or the like.

The frame memory 1405 is an image data memory that is shared by thevideo input processing section 1401, the first image enlarging/reducingsection 1402, the second image enlarging/reducing section 1403, thevideo output processing section 1404, and the encoding/decoding engine1407. The frame memory 1405 is implemented as, for example, asemiconductor memory such as a DRAM.

The memory control section 1406 receives a synchronous signal from theencoding/decoding engine 1407, and controls writing/reading access tothe frame memory 1405 according to an access schedule for the framememory 1405 written in an access management table 1406A. The accessmanagement table 1406A is updated through the memory control section1406 according to processing executed by the encoding/decoding engine1407, the first image enlarging/reducing section 1402, the second imageenlarging/reducing section 1403, or the like.

The encoding/decoding engine 1407 performs an encoding process ofencoding image data and a decoding process of decoding a video streamthat is data obtained by encoding image data. For example, theencoding/decoding engine 1407 encodes image data read from the framememory 1405, and sequentially writes the encoded image data in the videoES buffer 1408A as a video stream. Further, for example, theencoding/decoding engine 1407 sequentially reads the video stream fromthe video ES buffer 1408B, sequentially decodes the video stream, andsequentially writes the decoded image data in the frame memory 1405.Regarding the encoding or the decoding, the encoding/decoding engine1407 uses the frame memory 1405 as a working area. Further, theencoding/decoding engine 1407 outputs the synchronous signal to thememory control section 1406, for example, under timing under whichprocessing of each macroblock starts.

The video ES buffer 1408A buffers the video stream generated by theencoding/decoding engine 1407, and then provides the video stream to themultiplexer (MUX) 1412. The video ES buffer 1408B buffers the videostream provided from the demultiplexer (DMUX) 1413, and then providesthe video stream to the encoding/decoding engine 1407.

The audio ES buffer 1409A buffers an audio stream generated by the audioencoder 1410, and then provides the audio stream to the multiplexer(MUX) 1412. The audio ES buffer 1409B buffers an audio stream providedfrom the demultiplexer (DMUX) 1413, and then provides the audio streamto the audio decoder 1411.

For example, the audio encoder 1410 converts an audio signal input from,for example, the connectivity 1321 (FIG. 45 ) or the like into a digitalsignal, and encodes the digital signal according to a certain schemesuch as an MPEG audio scheme or an AudioCode number 3 (AC3) scheme. Theaudio encoder 1410 sequentially writes the audio stream that is dataobtained by encoding the audio signal in the audio ES buffer 1409A. Theaudio decoder 1411 decodes the audio stream provided from the audio ESbuffer 1409B, performs, for example, conversion into an analog signal,and provides a reproduced audio signal to, for example, the connectivity1321 (FIG. 45 ) or the like.

The multiplexer (MUX) 1412 performs multiplexing of the video stream andthe audio stream. A multiplexing method (that is, a format of abitstream generated by multiplexing) is arbitrary. Further, in the eventof multiplexing, the multiplexer (MUX) 1412 may add certain headerinformation or the like to the bitstream. In other words, themultiplexer (MUX) 1412 may convert a stream format by multiplexing. Forexample, the multiplexer (MUX) 1412 multiplexes the video stream and theaudio stream to be converted into a transport stream that is a bitstreamof a transfer format. Further, for example, the multiplexer (MUX) 1412multiplexes the video stream and the audio stream to be converted intodata (file data) of a recording file format.

The demultiplexer (DMUX) 1413 demultiplexes the bitstream obtained bymultiplexing the video stream and the audio stream by a methodcorresponding to the multiplexing performed by the multiplexer (MUX)1412. In other words, the demultiplexer (DMUX) 1413 extracts the videostream and the audio stream (separates the video stream and the audiostream) from the bitstream read from the stream buffer 1414. In otherwords, the demultiplexer (DMUX) 1413 can perform conversion (inverseconversion of conversion performed by the multiplexer (MUX) 1412) of aformat of a stream through the demultiplexing. For example, thedemultiplexer (DMUX) 1413 can acquire the transport stream providedfrom, for example, the connectivity 1321 or the broadband modem 1333(both FIG. 45 ) through the stream buffer 1414 and convert the transportstream into a video stream and an audio stream through thedemultiplexing. Further, for example, the demultiplexer (DMUX) 1413 canacquire file data read from various kinds of recording media by, forexample, the connectivity 1321 (FIG. 45 ) through the stream buffer 1414and convert the file data into a video stream and an audio stream by thedemultiplexing.

The stream buffer 1414 buffers the bitstream. For example, the streambuffer 1414 buffers the transport stream provided from the multiplexer(MUX) 1412, and provides the transport stream to, for example, theconnectivity 1321 or the broadband modem 1333 (both FIG. 45 ) undercertain timing or based on an external request or the like.

Further, for example, the stream buffer 1414 buffers file data providedfrom the multiplexer (MUX) 1412, provides the file data to, for example,the connectivity 1321 (FIG. 45 ) or the like under certain timing orbased on an external request or the like, and causes the file data to berecorded in various kinds of recording media.

Furthermore, the stream buffer 1414 buffers the transport streamacquired through, for example, the connectivity 1321 or the broadbandmodem 1333 (both FIG. 45 ), and provides the transport stream to thedemultiplexer (DMUX) 1413 under certain timing or based on an externalrequest or the like.

Further, the stream buffer 1414 buffers file data read from variouskinds of recording media in, for example, the connectivity 1321 (FIG. 45) or the like, and provides the file data to the demultiplexer (DMUX)1413 under certain timing or based on an external request or the like.

Next, an operation of the video processor 1332 having the aboveconfiguration will be described. The video signal input to the videoprocessor 1332, for example, from the connectivity 1321 (FIG. 45 ) orthe like is converted into digital image data according to a certainscheme such as a 4:2:2Y/Cb/Cr scheme in the video input processingsection 1401 and sequentially written in the frame memory 1405. Thedigital image data is read out to the first image enlarging/reducingsection 1402 or the second image enlarging/reducing section 1403,subjected to a format conversion process of performing a formatconversion into a certain scheme such as a 4:2:0Y/Cb/Cr scheme and anenlargement/reduction process, and written in the frame memory 1405again. The image data is encoded by the encoding/decoding engine 1407,and written in the video ES buffer 1408A as a video stream.

Further, an audio signal input to the video processor 1332 from theconnectivity 1321 (FIG. 45 ) or the like is encoded by the audio encoder1410, and written in the audio ES buffer 1409A as an audio stream.

The video stream of the video ES buffer 1408A and the audio stream ofthe audio ES buffer 1409A are read out to and multiplexed by themultiplexer (MUX) 1412, and converted into a transport stream, filedata, or the like. The transport stream generated by the multiplexer(MUX) 1412 is buffered in the stream buffer 1414, and then output to anexternal network through, for example, the connectivity 1321 or thebroadband modem 1333 (both FIG. 45 ). Further, the file data generatedby the multiplexer (MUX) 1412 is buffered in the stream buffer 1414,then output to, for example, the connectivity 1321 (FIG. 45 ) or thelike, and recorded in various kinds of recording media.

Further, the transport stream input to the video processor 1332 from anexternal network through, for example, the connectivity 1321 or thebroadband modem 1333 (both FIG. 45 ) is buffered in the stream buffer1414 and then demultiplexed by the demultiplexer (DMUX) 1413. Further,the file data that is read from various kinds of recording media in, forexample, the connectivity 1321 (FIG. 45 ) or the like and then input tothe video processor 1332 is buffered in the stream buffer 1414 and thendemultiplexed by the demultiplexer (DMUX) 1413. In other words, thetransport stream or the file data input to the video processor 1332 isdemultiplexed into the video stream and the audio stream through thedemultiplexer (DMUX) 1413.

The audio stream is provided to the audio decoder 1411 through the audioES buffer 1409B and decoded, and an audio signal is reproduced. Further,the video stream is written in the video ES buffer 1408B, sequentiallyread out to and decoded by the encoding/decoding engine 1407, andwritten in the frame memory 1405. The decoded image data is subjected tothe enlargement/reduction process performed by the second imageenlarging/reducing section 1403, and written in the frame memory 1405.Then, the decoded image data is read out to the video output processingsection 1404, subjected to the format conversion process of performingformat conversion to a certain scheme such as a 4:2:2Y/Cb/Cr scheme, andconverted into an analog signal, and a video signal is reproduced.

When the present technology is applied to the video processor 1332having the above configuration, it is preferable that the aboveembodiments of the present technology be applied to theencoding/decoding engine 1407. In other words, for example, theencoding/decoding engine 1407 preferably has the functions of the imageencoding device 100 (FIG. 18 ) and the image decoding device 200 (FIG.26 ) according to the above embodiments. Accordingly, the videoprocessor 1332 can obtain advantageous benefits similar to theadvantageous benefits described above with reference to FIGS. 1 to 33 .

Further, in the encoding/decoding engine 1407, the present technology(that is, the functions of the image encoding devices or the imagedecoding devices according to the above embodiment) may be implementedby either or both of hardware such as a logic circuit and software suchas an embedded program.

<Other Exemplary Configuration of Video Processor>

FIG. 47 illustrates another example of a schematic configuration of thevideo processor 1332 (FIG. 45 ) to which the present technology isapplied. In the case of the example of FIG. 47 , the video processor1332 has a function of encoding and decoding video data according to acertain scheme.

More specifically, the video processor 1332 includes a control section1511, a display interface 1512, a display engine 1513, an imageprocessing engine 1514, and an internal memory 1515 as illustrated inFIG. 47 . The video processor 1332 further includes a codec engine 1516,a memory interface 1517, a multiplexer/demultiplexer (MUX/DMUX) 1518, anetwork interface 1519, and a video interface 1520.

The control section 1511 controls an operation of each processingsection in the video processor 1332 such as the display interface 1512,the display engine 1513, the image processing engine 1514, and the codecengine 1516.

The control section 1511 includes, for example, a main CPU 1531, a subCPU 1532, and a system controller 1533 as illustrated in FIG. 47 . Themain CPU 1531 executes, for example, a program for controlling anoperation of each processing section in the video processor 1332. Themain CPU 1531 generates a control signal, for example, according to theprogram, and provides the control signal to each processing section(that is, controls an operation of each processing section). The sub CPU1532 plays a supplementary role of the main CPU 1531. For example, thesub CPU 1532 executes a child process or a subroutine of a programexecuted by the main CPU 1531. The system controller 1533 controlsoperations of the main CPU 1531 and the sub CPU 1532, for examples,designates a program executed by the main CPU 1531 and the sub CPU 1532.

The display interface 1512 outputs image data to, for example, theconnectivity 1321 (FIG. 45 ) or the like under control of the controlsection 1511. For example, the display interface 1512 converts imagedata of digital data into an analog signal, and outputs the analogsignal to, for example, the monitor device of the connectivity 1321(FIG. 45 ) as a reproduced video signal or outputs the image data of thedigital data to, for example, the monitor device of the connectivity1321 (FIG. 45 ).

The display engine 1513 performs various kinds of conversion processessuch as a format conversion process, a size conversion process, and acolor gamut conversion process on the image data under control of thecontrol section 1511 in compliance with, for example, a hardwarespecification of the monitor device that displays the image.

The image processing engine 1514 performs certain image processing suchas a filtering process for improving an image quality on the image dataunder control of the control section 1511.

The internal memory 1515 is a memory that is installed in the videoprocessor 1332 and shared by the display engine 1513, the imageprocessing engine 1514, and the codec engine 1516. The internal memory1515 is used for data transfer performed among, for example, the displayengine 1513, the image processing engine 1514, and the codec engine1516. For example, the internal memory 1515 stores data provided fromthe display engine 1513, the image processing engine 1514, or the codecengine 1516, and provides the data to the display engine 1513, the imageprocessing engine 1514, or the codec engine 1516 as necessary (forexample, according to a request). The internal memory 1515 can beimplemented by any storage device, but since the internal memory 1515 ismostly used for storage of small-capacity data such as image data ofblock units or parameters, it is desirable to implement the internalmemory 1515 using a semiconductor memory that is relatively small incapacity (for example, compared to the external memory 1312) and fast inresponse speed such as a static random access memory (SRAM).

The codec engine 1516 performs processing related to encoding anddecoding of image data. An encoding/decoding scheme supported by thecodec engine 1516 is arbitrary, and one or more schemes may be supportedby the codec engine 1516. For example, the codec engine 1516 may have acodec function of supporting a plurality of encoding/decoding schemesand perform encoding of image data or decoding of encoded data using ascheme selected from among the schemes.

In the example illustrated in FIG. 47 , the codec engine 1516 includes,for example, an MPEG-2 Video 1541, an AVC/H.264 1542, an HEVC/H.2651543, an HEVC/H.265 (Scalable) 1544, an HEVC/H.265 (Multi-view) 1545,and an MPEG-DASH 1551 as functional blocks of processing related to acodec.

The MPEG-2 Video 1541 is a functional block for encoding or decodingimage data according to an MPEG-2 scheme. The AVC/H.264 1542 is afunctional block for encoding or decoding image data according to an AVCscheme. The HEVC/H.265 1543 is a functional block for encoding ordecoding image data according to an HEVC scheme. The HEVC/H.265(Scalable) 1544 is a functional block for performing scalable coding orscalable decoding on image data according to the HEVC scheme. TheHEVC/H.265 (Multi-view) 1545 is a functional block for performingmulti-view encoding or multi-view decoding on image data according tothe HEVC scheme.

The MPEG-DASH 1551 is a functional block for transmitting and receivingimage data according to MPEG-Dynamic Adaptive Streaming over HTTP(MPEG-DASH). MPEG-DASH is a technique of streaming video using HyperTextTransfer Protocol (HTTP), and has a feature of selecting an appropriateone from among a plurality of pieces of encoded data that differs in apreviously prepared resolution or the like in units of segments andtransmitting the one that it selects. The MPEG-DASH 1551 performsgeneration of a stream complying with a standard, transmission controlof the stream, and the like, and uses the MPEG-2 Video 1541 to theHEVC/H.265 (Multi-view) 1545 for encoding and decoding of image data.

The memory interface 1517 is an interface for the external memory 1312.Data provided from the image processing engine 1514 or the codec engine1516 is provided to the external memory 1312 through the memoryinterface 1517. Further, data read from the external memory 1312 isprovided to the video processor 1332 (the image processing engine 1514or the codec engine 1516) through the memory interface 1517.

The multiplexer/demultiplexer (MUX/DMUX) 1518 performs multiplexing anddemultiplexing of various kinds of data related to an image such as abitstream of encoded data, image data, and a video signal. Themultiplexing/demultiplexing method is arbitrary. For example, in theevent of multiplexing, the multiplexer/demultiplexer (MUX/DMUX) 1518 cannot only combine a plurality of pieces of data into one but can also addcertain header information or the like to the data. Further, in theevent of demultiplexing, the multiplexer/demultiplexer (MUX/DMUX) 1518can not only divide one piece of data into a plurality of pieces of databut can also add certain header information or the like to each dividedpiece of data. In other words, the multiplexer/demultiplexer (MUX/DMUX)1518 can convert a data format through multiplexing and demultiplexing.For example, the multiplexer/demultiplexer (MUX/DMUX) 1518 can multiplexa bitstream to be converted into a transport stream serving as abitstream of a transfer format or data (file data) of a recording fileformat. Of course, inverse conversion can also be performed throughdemultiplexing.

The network interface 1519 is an interface for, for example, thebroadband modem 1333 or the connectivity 1321 (both FIG. 45 ). The videointerface 1520 is an interface for, for example, the connectivity 1321or the camera 1322 (both FIG. 45 ).

Next, an exemplary operation of the video processor 1332 will bedescribed. For example, when the transport stream is received from theexternal network through, for example, the connectivity 1321 or thebroadband modem 1333 (both FIG. 45 ), the transport stream is providedto the multiplexer/demultiplexer (MUX/DMUX) 1518 through the networkinterface 1519, demultiplexed, and then decoded by the codec engine1516. Image data obtained by the decoding of the codec engine 1516 issubjected to certain image processing performed, for example, by theimage processing engine 1514, subjected to certain conversion performedby the display engine 1513, and provided to, for example, theconnectivity 1321 (FIG. 45 ) or the like through the display interface1512, and the image is displayed on the monitor. Further, for example,image data obtained by the decoding of the codec engine 1516 is encodedby the codec engine 1516 again, multiplexed by themultiplexer/demultiplexer (MUX/DMUX) 1518 to be converted into filedata, output to, for example, the connectivity 1321 (FIG. 45 ) or thelike through the video interface 1520, and then recorded in variouskinds of recording media.

Furthermore, for example, file data of encoded data obtained by encodingimage data read from a recording medium (not illustrated) through theconnectivity 1321 (FIG. 45 ) or the like is provided to themultiplexer/demultiplexer (MUX/DMUX) 1518 through the video interface1520, and demultiplexed, and decoded by the codec engine 1516. Imagedata obtained by the decoding of the codec engine 1516 is subjected tocertain image processing performed by the image processing engine 1514,subjected to certain conversion performed by the display engine 1513,and provided to, for example, the connectivity 1321 (FIG. 45 ) or thelike through the display interface 1512, and the image is displayed onthe monitor. Further, for example, image data obtained by the decodingof the codec engine 1516 is encoded by the codec engine 1516 again,multiplexed by the multiplexer/demultiplexer (MUX/DMUX) 1518 to beconverted into a transport stream, provided to, for example, theconnectivity 1321 or the broadband modem 1333 (both FIG. 45 ) throughthe network interface 1519, and transmitted to another device (notillustrated).

Further, transfer of image data or other data between the processingsections in the video processor 1332 is performed, for example, usingthe internal memory 1515 or the external memory 1312. Furthermore, thepower management module 1313 controls, for example, power supply to thecontrol section 1511.

When the present technology is applied to the video processor 1332having the above configuration, it is desirable to apply the aboveembodiments of the present technology to the codec engine 1516. In otherwords, for example, it is preferable that the codec engine 1516 have afunctional block for implementing the image encoding device 100 (FIG. 18) and the image decoding device 200 (FIG. 26 ) according to the aboveembodiments. Furthermore, for example, the video processor 1332 can haveadvantageous benefits similar to the advantageous benefits describedabove with reference to FIGS. 1 to 43 .

Further, in the codec engine 1516, the present technology (that is, thefunctions of the image encoding devices or the image decoding devicesaccording to the above embodiment) may be implemented by either or bothof hardware such as a logic circuit and software such as an embeddedprogram.

Two exemplary configurations of the video processor 1332 have beendescribed above, but the configuration of the video processor 1332 isarbitrary and may be any configuration other than the above twoexemplary configurations. Further, the video processor 1332 may beconfigured with a single semiconductor chip or may be configured with aplurality of semiconductor chips. For example, the video processor 1332may be configured with a three-dimensionally stacked LSI in which aplurality of semiconductors is stacked. Further, the video processor1332 may be implemented by a plurality of LSIs.

<Application Examples to Devices>

The video set 1300 may be incorporated into various kinds of devicesthat process image data. For example, the video set 1300 may beincorporated into the television device 900 (FIG. 38 ), the mobiletelephone 920 (FIG. 39 ), the recording/reproducing device 940 (FIG. 40), the imaging device 960 (FIG. 41 ), or the like. As the video set 1300is incorporated, the devices can have advantageous benefits similar tothe advantageous benefits described above with reference to FIGS. 1 to33 .

Further, the video set 1300 may also be incorporated into a terminaldevice such as the personal computer 1004, the AV device 1005, thetablet device 1006, or the mobile telephone 1007 in the datatransmission system 1000 of FIG. 42 , the broadcasting station 1101 orthe terminal device 1102 in the data transmission system 1100 of FIG. 43, or the imaging device 1201 or the scalable encoded data storage device1202 in the imaging system 1200 of FIG. 44 . As the video set 1300 isincorporated, the devices can have advantageous benefits similar to theadvantageous benefits described above with reference to FIGS. 1 to 33 .Further, the video set 1300 may be incorporated into the contentreproducing system of FIG. 48 or the wireless communication system ofFIG. 54 .

Further, as far as including the video processor 1332, each constituentelement of the video set 1300 described above can be implemented as aconfiguration to which the present technology is applied. For example,the video processor 1332 alone can be implemented as a video processorto which the present technology is applied. Further, for example, theprocessors indicated by the dotted line 1341 as described above, thevideo module 1311, or the like can be implemented as, for example, aprocessor or a module to which the present technology is applied.Further, for example, a combination of the video module 1311, theexternal memory 1312, the power management module 1313, and the frontend module 1314 can be implemented as a video unit 1361 to which thepresent technology is applied. These configurations can haveadvantageous benefits similar to the advantageous benefits describedabove with reference to FIGS. 1 to 33 .

In other words, a configuration including the video processor 1332 canbe incorporated into various kinds of devices that process image data,similarly to the case of the video set 1300. For example, the videoprocessor 1332, the processors indicated by the dotted line 1341, thevideo module 1311, or the video unit 1361 can be incorporated into thetelevision device 900 (FIG. 38 ), the mobile telephone 920 (FIG. 39 ),the recording/reproducing device 940 (FIG. 40 ), the imaging device 960(FIG. 41 ), the terminal device such as the personal computer 1004, theAV device 1005, the tablet device 1006, or the mobile telephone 1007 inthe data transmission system 1000 of FIG. 43 , the broadcasting station1101 or the terminal device 1102 in the data transmission system 1100 ofFIG. 43 , the imaging device 1201 or the scalable encoded data storagedevice 1202 in the imaging system 1200 of FIG. 44 , or the like.Further, the configuration including the video processor 1332 may beincorporated into the content reproducing system of FIG. 48 or thewireless communication system of FIG. 54 . Furthermore, by incorporatingthe configuration to which the present technology is applied, thedevices can have advantageous benefits similar to the advantageousbenefits described above with reference to FIGS. 1 to 33 , similarly tothe video set 1300.

The present technology can also be applied to a system of selectingappropriate data from among a plurality of pieces of encoded data havingdifferent resolutions that is prepared in advance in units of segmentsand using the selected data, for example, a content reproducing systemof HTTP streaming or a wireless communication system of a Wi-Fi standardsuch as MPEG DASH which will be described later.

9. Application Example of MPEG-DASH

<Overview of Content Reproducing System>

First, a content reproducing system to which the present technology isapplicable will be schematically described with reference to FIGS. 48 to50 .

A basic configuration that is common in the embodiments will bedescribed below with reference to FIGS. 48 and 49 .

FIG. 48 is an explanatory diagram of a configuration of a contentreproducing system. The content reproducing system includes contentservers 1610 and 1611, a network 1612, and a content reproducing device1620 (a client device) as illustrated in FIG. 48 .

The content servers 1610 and 1611 are connected with the contentreproducing device 1620 via the network 1612. The network 1612 is awired or wireless transmission path of information transmitted from adevice connected to the network 1612.

For example, the network 1612 may include a public line network such asthe Internet, a telephone line network, or a satellite communicationnetwork, various kinds of LANs such as Ethernet (a registeredtrademark), a wide area network (WAN), or the like. Further, the network1612 may include a dedicated line network such as an Internetprotocol-virtual private network (IP-VPN).

The content server 1610 encodes content data, and generates and stores adata file including meta information of encoded data and encoded data.When the content server 1610 generates a data file of an MP4 format,encoded data corresponds to “mdat.” and meta information corresponds to“moov.”

Further, content data may be music data such as music, a lecture, or aradio program, video data such as a movie, a television program, a videoprogram, a photograph, a document, a painting, or a graph, a game,software, or the like.

Here, the content server 1610 generates a plurality of data files forthe same content at different bit rates. Further, in response to acontent reproduction request received from the content reproducingdevice 1620, the content server 1611 includes information of a parameteradded to a corresponding URL by the content reproducing device 1620 inURL information of the content server 1610, and transmits the resultantinformation to the content reproducing device 1620. Details on this willbe described below with reference to FIG. 49 .

FIG. 49 is an explanatory diagram of a data flow in the contentreproducing system of FIG. 48 . The content server 1610 encodes the samecontent data at different bit rates, and generates, for example, file Aof 2 Mbps, file B of 1.5 Mbps, and file C of 1 Mbps as illustrated inFIG. 49 . Relatively, file A has a high bit rate, file B has a standardbit rate, and file C has a low bit rate.

Further, encoded data of each file is divided into a plurality ofsegments as illustrated in FIG. 49 . For example, encoded data of file Ais divided into segments such as “A1,” “A2,” “A3,” . . . , and “An,”encoded data of file B is divided into segments such as “B1, B2,” “B3,”. . . , and “Bn,” and encoded data of file C is divided into segmentssuch as “C1,” “C2,” “C3,” . . . , and “Cn.”

Further, each segment may be configured with a configuration samplerather than one or more pieces of encoded video data and encoded audiodata that starts from a sink sample of MP4 (for example, an IDR-picturein video coding of AVC/H.264) and is independently reproducible. Forexample, when video data of 30 frames per second is encoded by a GOPhaving a fixed length of 15 frames, each segment may be encoded videoand audio data of 2 seconds corresponding to 4 GOPs or may be encodedvideo and audio data of 10 seconds corresponding to 20 GOPs.

Further, segments that are the same in an arrangement order in each filehave the same reproduction ranges (ranges of a time position from thehead of content). For example, the reproduction ranges of the segment“A2,” the segment “B2,” and the segment “C2” are the same, and when eachsegment is encoded data of 2 seconds, the reproduction ranges of thesegment “A2,” the segment “B2,” and the segment “C2” are 2 to 4 secondsof content.

When file A to file C configured with a plurality of segments aregenerated, the content server 1610 stores file A to file C. Further, asillustrated in FIG. 49 , the content server 1610 sequentially transmitssegments configuring different files to the content reproducing device1620, and the content reproducing device 1620 performs streamingreproduction on the received segments.

Here, the content server 1610 according to the present embodimenttransmits a play list file (hereinafter, a “media presentationdescription (MPD)”) including bit rate information and accessinformation of each piece of encoded data to the content reproducingdevice 1620, and the content reproducing device 1620 selects any of aplurality of bit rates based on the MPD, and requests the content server1610 to transmit a segment corresponding to the selected bit rate.

FIG. 48 illustrates only one content server 1610, but the presentdisclosure is not limited to this example.

FIG. 50 is an explanatory diagram illustrating a specific example of theMPD. The MPD includes access information of a plurality of pieces ofencoded data having different bit rates (bandwidths) as illustrated inFIG. 50 . For example, the MPD illustrated in FIG. 50 indicates thatthere are encoded data of 256 Kbps, encoded data of 1.024 Mbps, encodeddata of 1.384 Mbps, encoded data of 1.536 Mbps, and encoded data 2.048Mbps, and includes access information related to each piece of encodeddata. The content reproducing device 1620 can dynamically change a bitrate of encoded data that is subjected to streaming reproduction basedon the MPD.

Further, FIG. 48 illustrates a mobile terminal as an example of thecontent reproducing device 1620, but the content reproducing device 1620is not limited to this example. For example, the content reproducingdevice 1620 may be an information processing device such as a personalcomputer (PC), a home video processing device (a DVD recorder, a videocassette recorder (VCR)), a personal digital assistant (PDA), a home-usegame machine, or a household electric appliance. Further, the contentreproducing device 1620 may be an information processing device such asa mobile telephone, a personal handyphone system (PHS), a portable musicplayer, a portable video processing device, or a portable game machine.

<Configuration of Content Server 1610>

The overview of the content reproducing system has been described abovewith reference to FIGS. 48 to 50 . Next, a configuration of the contentserver 1610 will be described with reference to FIG. 51 .

FIG. 51 is a functional block diagram illustrating a configuration ofthe content server 1610. The content server 1610 includes a filegeneration section 1631, a storage section 1632, and a communicationsection 1633 as illustrated in FIG. 51 .

The file generation section 1631 includes an encoder 1641 that encodescontent data, and generates a plurality of pieces of encoded data havingdifferent bit rates for the same content and the MPD. For example, whenencoded data of 256 Kbps, encoded data of 1.024 Mbps, encoded data of1.384 Mbps, encoded data of 1.536 Mbps, and encoded data of 2.048 Mbpsare generated, the file generation section 1631 generates the MPDillustrated in FIG. 50 .

The storage section 1632 stores the plurality of pieces of encoded datahaving different bit rates and the MPD generated by the file generationsection 1631. The storage section 1632 may be a storage medium such as anon-volatile memory, a magnetic disk, an optical disc, or a magnetooptical (MO) disc. Examples of the non-volatile memory include anelectrically erasable programmable read-only memory (EEPROM) and anerasable programmable ROM (EPROM). As a magnetic disk, there are a harddisk, a disk type magnetic disk, and the like. Further, as an opticaldisc, there are a compact disc (CD) (a digital versatile disc recordable(DVD-R), a Blu-ray Disc (BD) (a registered trademark)), and the like.

The communication section 1633 is an interface with the contentreproducing device 1620, and communicates with the content reproducingdevice 1620 via the network 1612. In further detail, the communicationsection 1633 has a function as an HTTP server communicating with thecontent reproducing device 1620 according to HTTP. For example, thecommunication section 1633 transmits the MPD to the content reproducingdevice 1620, extracts encoded data requested based on the MPD from thecontent reproducing device 1620 according to the HTTP from the storagesection 1632, and transmits the encoded data to the content reproducingdevice 1620 as an HTTP response.

<Configuration of Content Reproducing Device 1620>

The configuration of the content server 1610 according to the presentembodiment has been described above. Next, a configuration of thecontent reproducing device 1620 will be described with reference to FIG.52 .

FIG. 52 is a functional block diagram of a configuration of the contentreproducing device 1620. The content reproducing device 1620 includes acommunication section 1651, a storage section 1652, a reproductionsection 1653, a selecting section 1654, and a present locationacquisition section 1656 as illustrated in FIG. 52 .

The communication section 1651 is an interface with the content server1610, requests the content server 1610 to transmit data, and acquiresdata from the content server 1610. In further detail, the communicationsection 1651 has a function as an HTTP client communicating with thecontent reproducing device 1620 according to HTTP. For example, thecommunication section 1651 can selectively acquire the MPD and thesegments of the encoded data from the content server 1610 using an HTTPrange.

The storage section 1652 stores various kinds of information related toreproduction of content. For example, the segments acquired from thecontent server 1610 by the communication section 1651 are sequentiallybuffered. The segments of the encoded data buffered in the storagesection 1652 are sequentially supplied to the reproduction section 1653in a first in first out (FIFO) manner.

Further, the storage section 1652 adds a parameter to a URL through thecommunication section 1651 based on an instruction to add a parameter toa URL of content that is described in the MPD and requested from thecontent server 1611 which will be described later, and stores adefinition for accessing the URL.

The reproduction section 1653 sequentially reproduces the segmentssupplied from the storage section 1652. Specifically, the reproductionsection 1653 performs segment decoding. DA conversion, rendering, andthe like.

The selecting section 1654 sequentially selects a bit rate to which asegment of encoded data to be acquired corresponds among bit ratesincluded in the MPD in the same content. For example, when the selectingsection 1654 sequentially selects the segments “A1,” “B2,” and “A3”according to the band frequency of the network 1612, the communicationsection 21651 sequentially acquires the segments “A1” “B2,” and “A3”from the content server 1610 as illustrated in FIG. 49 .

The present location acquisition section 1656 may be configured with amodule that acquires a current position of the content reproducingdevice 1620, for example, acquires a current position of a GlobalPositioning System (GPS) receiver or the like. Further, the presentlocation acquisition section 1656 may acquire a current position of thecontent reproducing device 1620 using a wireless network.

<Configuration of Content Server 1611>

FIG. 53 is a diagram for describing an exemplary configuration of thecontent server 1611. The content server 1611 includes a storage section1671 and a communication section 1672 as illustrated in FIG. 53 .

The storage section 1671 stores the URL information of the MPD. The URLinformation of the MPD is transmitted from the content server 1611 tothe content reproducing device 1620 according to the request receivedfrom the content reproducing device 1620 that requests reproduction ofcontent. Further, when the URL information of the MPD is provided to thecontent reproducing device 1620, the storage section 1671 storesdefinition information used when the content reproducing device 1620adds the parameter to the URL described in the MPD.

The communication section 1672 is an interface with the contentreproducing device 1620, and communicates with the content reproducingdevice 1620 via the network 1612. In other words, the communicationsection 1672 receives the request for requesting the URL information ofthe MPD from the content reproducing device 1620 that requestsreproduction of content, and transmits the URL information of the MPD tothe content reproducing device 1620. The URL of the MPD transmitted fromthe communication section 1672 includes information to which theparameter is added through the content reproducing device 1620.

Various settings can be performed on the parameter to be added to theURL of the MPD through the content reproducing device 1620 based on thedefinition information shared by the content server 1611 and the contentreproducing device 1620. For example, information such as a currentposition of the content reproducing device 1620, a user ID of the userusing the content reproducing device 1620, a memory size of the contentreproducing device 1620, and the capacity of a storage of the contentreproducing device 1620 may be added to the URL of the MPD through thecontent reproducing device 1620.

In the content reproducing system having the above configuration, as thepresent technology described above with reference to FIGS. 1 to 33 isapplied, advantageous benefits similar to the advantageous benefitsdescribed above with reference to FIGS. 1 to 33 can be obtained.

In other words, the encoder 1641 of the content server 1610 has thefunction of the image encoding device 100 (FIG. 18 ) according to theabove embodiment. Further, the reproduction section 1653 of the contentreproducing device 1620 has the function of the image decoding device200 (FIG. 26 ) according to the above embodiment. Thus, it is possibleto suppress an increase in workload in the event of image encoding anddecoding.

Further, in the content reproducing system, as data encoded according tothe present technology is transmitted and received, it is possible tosuppress a reduction in the encoding efficiency.

10. Application Examples of Wireless Communication System of Wi-FiStandard

<Basic Operation Example of Wireless Communication Device>

A basic operation example of a wireless communication device in thewireless communication system to which the present technology isapplicable will be described.

First, wireless packets are transmitted and received until a peer topeer (P2P) connection is established, and a specific application isoperated.

Then, before a connection is established through a second layer,wireless packets are transmitted and received until a specificapplication to be used is designated, then a P2P connection isestablished, and a specific application is operated. Thereafter, after aconnection is established through the second layer, wireless packets foractivating a specific application are transmitted and received.

<Example of Communication when Operation of Specific Application Starts>

FIGS. 54 and 55 are sequence charts illustrating an exemplarycommunication process by devices serving as the basis of wirelesscommunication as an example of transmission and reception of wirelesspackets until a P2P connection is established, and a specificapplication is operated. Specifically, an exemplary direct connectionestablishment process of establishing a connection in the Wi-Fi Directstandard (which is also referred to as “Wi-Fi P2P”) standardized by theWi-Fi Alliance is illustrated.

Here, in Wi-Fi Direct, a plurality of wireless communication devicesdetects the presence of the wireless communication device of the otherparty (device discovery and service discovery). Further, when connectiondevice selection is performed, device authentication is performedbetween the selected devices through Wi-Fi protected setup (WPS), andthen a direct connection is established. In Wi-Fi Direct, a plurality ofwireless communication devices decides whether to be a master device (agroup owner) or a slave device (a client), and forms a communicationgroup.

However, in this exemplary communication process, transmission andreception of some packets are not illustrated. For example, at the timeof a first connection, packet exchange for using a WPS is unnecessary asdescribed above, and packet exchange is also necessary in exchange of anauthentication request/response or the like. However, in FIGS. 54 and 55, such packet exchange is not illustrated, and only a second connectionand later are illustrated.

Further, in FIGS. 54 and 55 , an exemplary communication process betweena first wireless communication device 1701 and a second wirelesscommunication device 1702 is illustrated, but what have been describedabove can be similarly applied to a communication process between otherwireless communication devices.

First, the device discovery is performed between the first wirelesscommunication device 1701 and the second wireless communication device1702 (1711). For example, the first wireless communication device 1701transmits a probe request (a response request signal), and receives aprobe response (a response signal) to the probe request from the secondwireless communication device 1702. Thus, the first wirelesscommunication device 1701 and the second wireless communication device1702 can discover the presence of the other party. Further, through thedevice discovery, it is possible to acquire a device name or a type (aTV, a PC, a smart phone, or the like) of the other party.

Then, the service discovery is performed between the first wirelesscommunication device 1701 and the second wireless communication device1702 (1712). For example, the first wireless communication device 1701transmits a service discovery query of querying a service supported bythe second wireless communication device 1702 discovered through thedevice discovery. Then, the first wireless communication device 1701 canacquire a service supported by the second wireless communication device1702 by receiving a service discovery response from the second wirelesscommunication device 1702. In other words, through the servicediscovery, it is possible to acquire, for example, a service executableby the other party. For example, the service executable by the otherparty is a service or a protocol (Digital Living Network Alliance(DLNA), Digital Media Renderer (DMR), or the like).

Then, the user performs an operation (a connection partner selectionoperation) of selecting a connection partner (1713). The connectionpartner selection operation may be performed in only one of the firstwireless communication device 1701 and the second wireless communicationdevice 1702. For example, a connection partner selection screen isdisplayed on a display section of the first wireless communicationdevice 1701, and the second wireless communication device 1702 isselected on the connection partner selection screen as a connectionpartner according to the user's operation.

When the user performs the connection partner selection operation(1713), a group owner negotiation is performed between the firstwireless communication device 1701 and the second wireless communicationdevice 1702 (21714). In the example illustrated in FIGS. 54 and 55 , asa result of the group owner negotiation, the first wirelesscommunication device 1701 becomes a group owner 1715, and the secondwireless communication device 1702 becomes a client 1716.

Then, processes (1717 to 1720) are performed between the first wirelesscommunication device 1701 and the second wireless communication device1702, and thus a direct connection is established. In other words,association (L2 (second layer) link establishment) (1717) and securelink establishment (1718) are sequentially performed. Further, IPaddress assignment (1719) and L4 setup (1720) on L3 by a simple theservice discovery protocol (SSDP) are sequentially performed. Further,L2 (layer 2) indicates a second layer (a data link layer), L3 (layer 3)indicates a third layer (a network layer), and L4 (layer 4) indicates afourth layer (a transport layer).

Then, the user performs a specific application designation operation oran activation operation (an application designation/activationoperation) (1721). The application designation/activation operation maybe performed in only one of the first wireless communication device 1701and the second wireless communication device 1702. For example, anapplication designation/activation operation screen is displayed on adisplay section of the first wireless communication device 1701, and aspecific application is selected on the applicationdesignation/activation operation screen according to the user'soperation.

When the user performs the application designation/activation operation(1721), a specific application corresponding to applicationdesignation/activation operation is executed between the first wirelesscommunication device 1701 and the second wireless communication device1702 (1722).

Here, a connection is considered to be performed between access pointstations (AP-STAs) within a range of a specification (a specificationstandardized in IEEE802.11) older than the Wi-Fi Direct standard. Inthis case, it is difficult to detect a device to be connected in advancebefore a connection is established through the second layer (in theterminology of IEEE802.11, before “association” is performed).

On the other hand, as illustrated in FIGS. 54 and 55 , in Wi-Fi Direct,when a connection partner candidate is searched for through the devicediscovery or the service discovery (option), it is possible to acquireinformation of a connection partner. Examples of the information of theconnection partner include a type of a basic device and a supportedspecific application. Further, it is possible to allow the user toselect the connection partner based on the acquired information of theconnection partner.

By extending this specification, it is also possible to implement awireless communication system in which a specific application isdesignated before a connection is established through the second layer,a connection partner is selected, and the specific application isautomatically activated after the selection. An example of a sequence ofestablishing a connection in this case is illustrated in FIG. 57 .Further, an exemplary configuration of a frame format transmitted andreceived in the communication process is illustrated in FIG. 56 .

<Exemplary Configuration of Frame Format>

FIG. 56 is a diagram schematically illustrating an exemplaryconfiguration of a frame format transmitted and received in acommunication process performed by devices serving as the basis of thepresent technology. In other words, FIG. 56 illustrates an exemplaryconfiguration of an MAC frame used to establish a connection through thesecond layer. Specifically, an example of a frame format of anassociation request/response (1787) for implementing the sequenceillustrated in FIG. 57 is illustrated.

A portion from the frame control (1751) to a sequence control (1756)serves as an MAC header. Further, when an association request istransmitted, B3B2=“0b00” and B7B6B5B4=“0b0000” are set in the framecontrol (1751). Further, when an association response is encapsulated,B3B2=“0b00” and B7B6B5B4=“0b0001” are set in the frame control (1751).Further. “0b00” is “00” in binary notation, “0b0000” is “0000” in binarynotation, and “0b0001” is “0001” in binary notation.

Here, the MAC frame illustrated in FIG. 56 is basically an associationrequest/response frame format described in sections 7.2.3.4 and 7.2.3.5of the IEEE802.11-2007 specification. However, a difference lies in thatindependently extended information elements (hereinafter abbreviated as“IEs”) are included in addition to IEs defined in the IEEE 802.11specification.

Further, in order to indicate a vendor specific IE (1760), the decimalnumber 127 is set to an IE type (information element ID (1761)). In thiscase, through section 7.3.2.26 of the IEEE802.11-2007 specification, alength field (1762) and an OUI field (1763) are subsequent, and vendorspecific content (1764) is subsequently arranged.

As the vendor specific content (1764), a field (IE type (1765))indicating a type of a vendor specific IE is first set. Subsequently, aconfiguration capable of storing a plurality of sub elements (1766) canbe considered.

As content of the sub element (1766), a name (1767) of a specificapplication to be used and a device role (1768) when the specificapplication operates can be included. Further, information (informationfor L4 setup) (1769) of a specific application, a port number used forcontrol thereof, or the like and information (capability information)related to the capability in a specific application can be included.Here, for example, when a designated specific application is DLNA, thecapability information is information for specifying whether or notaudio transmission/reproduction is supported, whether or not videotransmission/reproduction is supported, or the like.

In the wireless communication system having the above configuration, asthe present technology described above with reference to FIGS. 1 to 33is applied, advantageous benefits similar to the advantageous benefitsdescribed above with reference to FIGS. 1 to 33 can be obtained. Inother words, it is possible to suppress an increase in workload in theevent of image encoding and decoding. Further, in the wirelesscommunication system, as transmission and reception of data encodedaccording to the present technology are performed, it is possible tosuppress a reduction in the encoding efficiency.

Further, in this specification, the example in which various kinds ofinformation are multiplexed into an encoded stream and transmitted fromthe encoding side to the decoding side has been described. However, atechnique of transmitting the information is not limited to thisexample. For example, the information may be transmitted or recorded asindividual data associated with an encoded bitstream without beingmultiplexed in the encoded stream. Here, the term “associate” refers tothat an image included in the bitstream (which may be part of an imagesuch a slice or a block) and information corresponding to the image isconfigured to be linked at the time of decoding. That is, theinformation may be transmitted on a separate transmission path from animage (or bitstream). In addition, the information may be recorded on aseparate recording medium (or a separate recording area of the samerecording medium) from the image (or bitstream). Further, theinformation and the image (or the bitstream), for example, may beassociated with each other in an arbitrary unit such as a plurality offrames, one frame, or a portion within the frame.

The preferred embodiments of the present disclosure have been describedabove with reference to the accompanying drawings, whilst the presentinvention is not limited to the above examples, of course. A personskilled in the art may find various alterations and modifications withinthe scope of the appended claims, and it should be understood that theywill naturally come under the technical scope of the present disclosure.

Additionally, the present technology may also be configured as below.

(1)

An image encoding device including:

a generation section configured to generate control information used tocontrol a certain area in which encoding-related information, of anotherlayer encoded for each of a plurality of certain areas obtained bydividing a picture, is referred to regarding a current layer of imagedata including a plurality of layers:

an encoding section configured to encode the current layer of the imagedata with reference to the encoding-related information of some areas ofthe other layer according to control of the control informationgenerated by the generation section; and

a transmission section configured to transmit encoded data of the imagedata generated by the encoding section and the control informationgenerated by the generation section.

(2)

The image encoding device according to any of (1), (3) to (9),

wherein the control information is information limiting an area in whichthe encoding-related information is referred to by designating an areain which reference to the encoding-related information of the otherlayer is permitted, designating an area in which reference to theencoding-related information is prohibited, or designating an area inwhich the encoding-related information is referred to.

(3)

The image encoding device according to any of (1), (2). (4) to (9),

wherein the control information designates the area using anidentification number allocated in a raster scan order, informationindicating positions of the area in vertical and horizontal directionsin a picture, or information indicating a data position of the area inthe encoded data.

(4)

The image encoding device according to any of (1) to (3) and (5) to (9),

wherein the transmission section further transmits informationindicating whether or not to control an area in which theencoding-related information is referred to.

(5)

The image encoding device according to any of (1) to (4) and (6) to (9),

wherein the encoding-related information is information used forgeneration of a prediction image used in encoding of the image data.

(6)

The image encoding device according to any of (1) to (5) and (7) to (9),

wherein the information used for the generation of the prediction imageincludes information used for texture prediction of the image data andinformation used for syntax prediction of the image data, and

the control information is information used to independently control anarea in which the information used for the texture prediction isreferred to and an area in which the information used for the syntaxprediction is referred to.

(7)

The image encoding device according to any of (1) to (6), (8), and (9),

wherein the generation section generates the control information foreach of the plurality of certain areas obtained by dividing the pictureof the current layer of the image data, and

the encoding section encodes the current layer of the image data withreference to the encoding-related information of some areas of the otherlayer for each of the areas according to control of the controlinformation of each area generated by the generation section.

(8)

The image encoding device according to any of (1) to (7) and (9),

wherein the transmission section further transmits informationindicating whether or not an area division of the current layer issimilar to an area division of the other layer.

(9)

The image encoding device according to any of (1) to (8),

wherein the area is a slice or a tile of the image data.

(10)

An image encoding method including:

generating control information used to control a certain area in whichencoding-related information, of another layer encoded for each of aplurality of certain areas obtained by dividing a picture, is referredto regarding a current layer of image data including a plurality oflayers;

encoding the current layer of the image data with reference to theencoding-related information of some areas of the other layer accordingto control of the generated control information: and

transmitting encoded data generated by encoding the image data and thegenerated control information.

(11)

An image decoding device including:

a reception section configured to receive encoded data of a currentlayer of image data including a plurality of layers and controlinformation used to control a certain area in which encoding-relatedinformation, of another layer encoded for each of a plurality of certainareas obtained by dividing a picture of the image data, is referred to;and

a decoding section configured to decode the encoded data with referenceto the encoding-related information of some areas of the other layeraccording to control of the control information received by thereception section.

(12)

The image decoding device according to any of (11) to (13) to (19),

wherein the control information is information limiting an area in whichthe encoding-related information is referred to by designating an areain which reference to the encoding-related information of the otherlayer is permitted, designating an area in which reference to theencoding-related information is prohibited, or designating an area inwhich the encoding-related information is referred to.

(13)

The image decoding device according to any of (11), (12) and (14) to(19),

wherein the control information designates the area using anidentification number allocated in a raster scan order, informationindicating positions of the area in vertical and horizontal directionsin a picture, or information indicating a data position of the area inthe encoded data.

(14)

The image decoding device according to any of (11) to (13) and (15) to(19),

wherein the reception section further receives information indicatingwhether or not to control an area in which the encoding-relatedinformation is referred to.

(15)

The image decoding device according to any of (11) to (14) and (16) to(19),

wherein the encoding-related information is information used forgeneration of a prediction image used in decoding of the encoded data.

(16)

The image decoding device according to any of (11) to (15) and (17) to(19),

wherein the information used for the generation of the prediction imageincludes information used for texture prediction of the image data andinformation used for syntax prediction of the image data, and

the control information is information used to independently control anarea in which the information used for the texture prediction isreferred to and an area in which the information used for the syntaxprediction is referred to.

(17)

The image decoding device according to any of (11) to (16), (18), and(19),

wherein the reception section receives the encoded data encoded for eachof the plurality of certain areas obtained by dividing the picture ofthe current layer of the image data and the control information of eachof the areas, and

the decoding section decodes the encoded data received by the receptionsection with reference to the encoding-related information of some areasof the other layer for each of the areas according to control of thecontrol information of each area.

(18)

The image decoding device according to any of (11) to (17) and (19),

wherein the reception section further receives information indicatingwhether or not an area division of the current layer is similar to anarea division of the other layer.

(19)

The image decoding device according to any of (11) to (18),

wherein the area is a slice or a tile of the image data.

(20)

An image decoding method including:

receiving encoded data of a current layer of image data including aplurality of layers and control information used to control a certainarea in which encoding-related information, of another layer encoded foreach of a plurality of certain areas obtained by dividing a picture ofthe image data, is referred to; and

decoding the encoded data with reference to the encoding-relatedinformation of some areas of the other layer according to control of thereceived control information.

REFERENCE SIGNS LIST

-   100 Image encoding device-   101 Base layer image encoding section-   102 Enhancement layer image encoding section-   103 Multiplexing unit-   116 Lossless encoding section-   117 Accumulation buffer-   122 Frame memory-   124 Intra prediction section-   125 Inter prediction section-   136 Lossless encoding section-   137 Accumulation buffer-   142 Frame memory-   144 Intra prediction section-   145 Inter prediction section-   148 Area synchronization section-   149 Up-sampling section-   171 Base layer area division information buffer-   172 Enhancement layer area division setting section-   173 Area synchronization setting section-   200 Image decoding device-   201 Demultiplexing unit-   202 Base laver image decoding section-   203 Enhancement layer image decoding section-   211 Accumulation buffer-   212 Lossless decoding section-   219 Frame memory-   221 Intra prediction section-   222 Inter prediction section-   231 Accumulation buffer-   232 Lossless decoding section-   239 Frame memory-   241 Intra prediction section-   242 Inter prediction section-   244 Area synchronization section-   245 Up-sampling section-   271 Base layer area division information buffer-   272 Enhancement layer area division information buffer-   273 Synchronization area information decoding section

1. An image encoding device for encoding image data including a baselayer and at least one enhancement layer, the image encoding devicecomprising: a generation section configured to generate controlinformation designating, for a tile of an enhancement layer, a permittedreference area in the base layer, wherein each of the base layer, the atleast one enhancement layer, and the permitted reference area compriseone or more tiles, wherein for each of the base layer and the at leastone enhancement layer a picture of the image data is divided into thetiles which are independently decodable; an encoding section configuredto encode the tile of the enhancement layer with reference toencoding-related information of the permitted reference area in the baselayer according to the control information generated by the generationsection; and a transmission section configured to transmit encoded dataof the image data generated by the encoding section and the controlinformation generated by the generation section, wherein the generationsection, the encoding section, and the transmission section are eachimplemented via at least one processor.
 2. The image encoding deviceaccording to claim 1, wherein the control information designates thetile of the permitted reference area using an identification numberallocated in a raster scan order.
 3. The image encoding device accordingto claim 1, wherein the control information designates the tile of thepermitted reference area using information indicating a position of thetile in a picture.
 4. The image encoding device according to claim 1,wherein the generation section is further configured to generateinformation indicating whether or not a tile division of the enhancementlayer is similar to a tile division of the base layer.
 5. The imageencoding device according to claim 1, wherein the generation section isfurther configured to generate control information determinationinformation serving as information used to control whether or notcontrol information is transmitted, and generate the control informationbased on the control information determination information.
 6. The imageencoding device according to claim 1, wherein the transmission sectionis further configured to transmit information indicating whether or notto control a tile in which the encoding-related information is referredto.
 7. The image encoding device according to claim 1, wherein theencoding-related information includes information used for generation ofa prediction image used in encoding of the image data.
 8. The imageencoding device according to claim 7, wherein the information used forthe generation of the prediction image includes information used fortexture prediction of the image data and information used for syntaxprediction of the image data, and the control information includesinformation used to independently control a tile in which theinformation used for the texture prediction is referred to and a tile inwhich the information used for the syntax prediction is referred to. 9.The image encoding device according to claim 1, wherein the controlinformation designates a tile in which reference to the encoding-relatedinformation of the base layer is permitted.
 10. The image encodingdevice according to claim 1, wherein the control information designatesa tile in which reference to the encoding-related information of thebase layer is prohibited.
 11. The image encoding device according toclaim 1, wherein the generation section is further configured togenerate the control information for each certain tile of a plurality ofcertain tiles obtained by dividing the picture of the enhancement layerof the image data, and the encoding section is further configured toencode the enhancement layer of the image data with reference toencoding-related information of some tiles of the base layer for each ofthe tiles according to control of the control information of each tilegenerated by the generation section.
 12. An image encoding method forencoding image data including a base layer and at least one enhancementlayer, the method comprising: generating control designating, for a tileof an enhancement layer, a permitted reference area in the base layer,wherein each of the base layer, the at least one enhancement layer, andthe permitted reference area comprise one or more tiles, wherein foreach of the base layer and the at least one enhancement layer a pictureof the image data is divided into the tiles which are independentlydecodable; encoding the tile of the enhancement layer with reference toencoding-related information of the permitted reference area in the baselayer according to the control information; and transmitting encodeddata of the image data and the control information.
 13. An imagedecoding device comprising: a reception section configured to receiveencoded data of a tile of an enhancement layer of image data, the imagedata including a base layer and at least one enhancement layer, whereineach of the base layer and the at least one enhancement layer comprisesone or more tiles, wherein for each of the base layer and the at leastone enhancement layer a picture of the image data is divided into thetiles which are independently decodable, and receive control informationdesignating, for the tile of the enhancement layer, a permittedreference area in the base layer, wherein reference to encoding-relatedinformation for the permitted reference area for decoding of the encodeddata of the tile of the enhancement layer is permitted, wherein thepermitted reference area identifies a set of tiles; and a decodingsection configured to configured to decode the encoded data withreference to the permitted reference area according to control of thecontrol information received by the reception section, wherein thereception section and the decoding section are each implemented via atleast one processor.
 14. The image decoding device according to claim13, wherein the control information designates the set of tiles using anidentification number allocated in a raster scan order, informationindicating positions of the tile in vertical and horizontal directionsin a picture, or information indicating a data position of the tile inthe encoded data.
 15. The image decoding device according to claim 13,wherein the reception section is further configured to receiveinformation indicating whether or not to control a tile in which theencoding-related information is referred to.
 16. The image decodingdevice according to claim 13, wherein the encoding-related informationincludes information used for generation of a prediction image used indecoding of the encoded data.
 17. The image decoding device according toclaim 16, wherein the information used for the generation of theprediction image includes information used for texture prediction of theimage data and information used for syntax prediction of the image data,and the control information includes information used to independentlycontrol a tile in which the information used for the texture predictionis referred to and a tile in which the information used for the syntaxprediction is referred to.
 18. The image decoding device according toclaim 13, wherein the reception section is further configured to receivethe encoded data encoded for each of the tiles obtained by dividing thepicture of the enhancement layer of the image data and the controlinformation of each of the tiles, and the decoding section is furtherconfigured to decode the encoded data received by the reception sectionwith reference to the encoding-related information of some tiles of thebase layer for each of the tiles according to control of the controlinformation of each tile.
 19. The image decoding device according toclaim 13, wherein the reception section is further configured to receiveinformation indicating whether it is possible to detect a tile divisionof the enhancement layer with reference to a tile division of the baselayer.
 20. The image decoding device according to claim 13, wherein thecontrol information designates a tile in which reference to theencoding-related information of the base layer is permitted.
 21. Theimage decoding device according to claim 13, wherein the controlinformation designates a tile in which reference to the encoding-relatedinformation of the base layer is prohibited.
 22. The image decodingdevice according to claim 13, wherein the control information designatesa tile in which the encoding-related information of the base layer isreferred to.
 23. The image decoding device according to claim 13,wherein the reception section is further configured to receive controlinformation determination information serving as information used tocontrol whether or not control information is transmitted, and receivethe control information based on the control information determinationinformation.
 24. An image decoding method comprising: receiving encodeddata of a tile of an enhancement layer of image data, the image dataincluding a base layer and at least one enhancement layer, wherein eachof the base layer and the at least one enhancement layer comprises oneor more tiles, wherein for each of the base layer and the at least oneenhancement layer a picture of the image data is divided into the tileswhich are independently decodable; receiving control informationdesignating, for the tile of the enhancement layer, a permittedreference area in the base layer, wherein reference to encoding-relatedinformation for the permitted reference area for decoding of the encodeddata of the tile of the enhancement layer is permitted, wherein thepermitted reference area identifies a set of tiles; and decoding theencoded data with reference to the permitted reference area according tocontrol of the control information.